Commons:Village pump/Proposals/Archive/2012/10

From Wikimedia Commons, the free media repository
Jump to: navigation, search
Archive This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Automated deletion of files tagged as copyvios or no-permission by trusted users if no human does it

I think we're all aware of the fact that many people don't like performing repetitive tasks. Thanks Túrelio and a few other administrators, copyright violations are deleted but the files tagged (marked) with no-permission are often treated without a lot of care. Therefore it often makes no difference whether an admin has to spend his/her time (>1h/day!) for all these files or whether a bot does the clean up.

I propose that deletion by bot should take place under the following conditions:

  • File marked by a trusted user or by Nikbot.
  • A human did not delete the file in time.
  • File is not nominated for deletion.
  • File was not edited after it was marked.
  • File talk page was not edited after the file was marked.
  • File is marked with {{No permission since}}, {{No license since}}, {{No source since}} or is in Category:Copyright violations.
  • Uploader was notified.
  • File is not protected and is in use less than 3 times.

Definitions:

  • Trusted user:
    • A trusted user is an active contributor since min. 1 year who was never blocked (some users prefer to block themselves for a time) and is a patroller, administrator or license reviewer or who is on a white list
  • Files not deleted by humans:

People mis-tagging files will be held personally responsible. Introducing such a bot will be announced before so everyone will be aware of this change.

The deleting bot will cite the user who tagged the file in the deletion summary and notify the uploader(s) about the automated deletion (respecting {{Bots}}).

This change will

  • reduce the admin-backlog and therefore admins can spend more time talking to uploaders / teaching what is suitable for Commons
  • hopefully improve the quality of how people mark the files
  • administrators can spend their time investigating whether a file can be kept due to exemptions and remove the “tags”

I am interested in your thoughts. Nothing of this has been implemented yet; it's just an idea. If someone find it's worth looking into details, I could start recording statistics before we would start speculating how many files would be affected. -- Rillke(q?) 13:17, 19 September 2012 (UTC)

Could you point me to a definition of the scope of Nikbot, I cannot see anything obvious on the bot's user page. I could not support automatic deletions based on a bot script I cannot see a definition for. I assume that giving such effective power to a bot would mean a code-freeze from this point on unless there were a significant consensus to change scope or this automated effective deletion authority were removed anytime the bot were revised? Thanks -- (talk) 13:25, 19 September 2012 (UTC)
Yes, the code should not be changed, except for security reasons after the bot was approved without new approval. Nikbot is a clone of Filbot (see also Commons:Bots/Requests/Nikbot). It marks recently uploaded files that don't have a license template on the file description page. The source code is available. Since Commons:Essential information demands a license template and in most cases, you can't choose one for another user, most files are deleted anyway, if they are not changed or a license template was added. -- Rillke(q?) 14:18, 19 September 2012 (UTC)
I am very nervous about the deletion policies and the way they are interpreted by trusted humans let alone a Bot. The cases in point are the Right to Panorama issues in France. The world and his neighbour says their is no right to panorama in France, but reading in law in French shows that this is an anglo-centric simplification and WP applies it erroneously, erasing innocent images in the process. The intervention time to stop an erase assumes that the uploader has permanent internet access- not intermittent access on a shaky line in a low tech region of a low tech country. The advantages seem spurious. 1. the backlog will expand to fill the space available. 2. I see no evidence of mentoring by admins- whose skills are generally more technical that human. 3. Geotagging is a huge issue for me but quietly tagging the file oneself- leading by example seems a better way to deal with naive volunteers- that to zap some of their early efforts. 4. I would never try to direct any wikipedian how to spend their time- and investigating exemptions is a career rather than a leisure pursuit. I think this is a well meant deadend. I would be happier to see a quarentine system done by humans- it seems tragic that someone will spend so much time and effort determining that the file is a copyvio- but that knowledge is lost and not reported- it would be wonderful if we could collate the copyvios into a reference work that could be accessed for learning purposes (in-service training so to speak. At the moment there is a herd of elephants in Category:Copyright violations -whichever way the decision goes it would be fascinating to learn of their offence and fate! --ClemRutter (talk) 14:35, 19 September 2012 (UTC)
We already collate examples a bit (eg at COM:TOO and COM:DM) - but I agree we could do a lot more to educate/train people. Rd232 (talk) 14:53, 19 September 2012 (UTC)

Symbol support vote.svg Support the basic idea of automated deletion, as long as sufficient safeguards are applied and testing is done. For instance, as part of the testing we'd certainly want a dry run to see just a listing of what the bot would have deleted. Conditions:

  1. Only files not in use should be deleted - anything that's in use deserves a human decision. The bot can help clear the less important files, when the backlog gets too big.
  2. Only files uploaded within the last 3 months - anything that's been around longer deserves a human decision. (If this excludes too many files in practice, we can reconsider this one.)
  3. Only files that have been tagged for 10 days or longer. Files shouldn't be deleted before 7 days, so give some time for a human to do it.

With these conditions the bot will provide a useful backstop to stop the backlog getting too big. The very existence of the bot will probably also encourage admins to keep the backlog in check. Rd232 (talk) 14:53, 19 September 2012 (UTC)

Symbol support vote.svg Support the basic idea, but I think that the bot also should check that the template {{kept}} isn't used on the talk page. If the file has been kept in a previous deletion discussion, it should go to a new deletion request. --Stefan4 (talk) 13:00, 21 September 2012 (UTC)

Symbol support vote.svg Support Agree with the idea, if the conditions mentioned by Rd232 and Stefan4 are met. Yann (talk) 13:12, 21 September 2012 (UTC)

Pictogram voting comment.svg Comment Bot should check edit history of file prior to tagging for deletion - check info had been removed at some point (by accidental or deliberate blanking) (only really applies to no license). That's something humans are unlikely to check as it takes too long... The bot should wait a minimum period before deleting; a human admin can judge if a request needs fuller discussion. The bot can only tell that if someone objects - so needs to give them time to object in.--Nilfanion (talk) 10:12, 22 September 2012 (UTC)
Checking the edit history is a valuable suggestion. --Túrelio (talk) 10:17, 22 September 2012 (UTC)
Pictogram voting comment.svg Comment For images tagged with no permission or no license, the source should be checked, it's possible that the source has been changed since the file was tagged, also license on the source may have been missed, so perhaps the bot shouldn't touch the files with a source.  ■ MMXX talk 21:00, 1 October 2012 (UTC)
Pictogram voting comment.svg Comment I have seen cases of files tagged, but the warning was not added to the uploader's talk page. It would be nice if the bot could check that was done. Yann (talk) 17:32, 4 October 2012 (UTC)
Good point: there are some things an automated deletion tool can potentially do in terms of checking which in practice admins probably don't have time to. That's one example. Rd232 (talk) 18:25, 4 October 2012 (UTC)

Supporting files

Some media files uploaded on commons are computer generated from one or more original input files which are usually not uploaded to commons. Some examples:

  1. file on commons: panorama stitched from N images; file(s) not on commons: N original images
  2. file on commons: jpg image from camera raw file; file(s) not on commons: raw file
  3. file on commons: png relief map with labels, legend, etc; files(s) not on commons: e.g. jpg relief, svg labels, png compass rose,... all used to create the final map
  4. file on commons: png plot of a mathematical function; file(s) not on commons: gnuplot script file used to create the plot

Now, if a mistake is found in one of the uploaded images or if newer technology allows higher quality files, the only way to fix or improve it is through the original creator, which causes all kinds of problems: creator inactive on wikimedia, creator has lost/deleted original files, etc. Therefore I think it would be a good idea if there was some software support at commons to upload these additional files and associate them with the main file somehow. I understand that I could upload additional files the normal way to commons and link to them in the image description of the main file, but first of all, this is lots of manual/boring work and second I'd be limited to the file types supported by commons (e.g. no raw files). So, my proposal is:

  • create a separate database for "supporting" files, which can be of more arbitrary type and would be only available for download, i.e. not directly displayed on commons, wikipedia, etc.
  • after uploading the main file, have a link ("upload supporting files") to an upload form
  • the upload form should have fields for file names and optionally short descriptions (saying what each file is for); likely there would also have to be some licensing stuff covered (but I don't know much about that)

Does this sound useful? bamse (talk) 19:31, 1 October 2012 (UTC)

Of course. Someone has just to care for that the new database is not abused as personal file storage, which is more difficult if you can't preview the files (like RAW files). -- Rillke(q?) 10:24, 2 October 2012 (UTC)
I totally sympathise with your aims, and feel like it ought to be possible, but I don't think it'll happen. Developers have been reluctant to enable filetypes the software can't display, because of security risks. See long-term problems getting new filetypes approved: COM:UNSUPPORTED. In the short/medium term there is http://www.commonsarchive.org. Rd232 (talk) 11:06, 2 October 2012 (UTC)
Well, several of Bamse points do not need special software support. For example, Gnuplot source code. See Category:Images including source code in their description. Jean-Fred (talk) 11:54, 2 October 2012 (UTC)
And see {{Source code please}} as well. Jean-Fred (talk) 11:56, 2 October 2012 (UTC)
Thank you all for the positive replies. I don't think it would be very attractive for personal file storage as files are public and the user interface would be much more cumbersome than something like dropbox. Commons archive looks interesting, though I don't understand if and how it interacts with wikimedia commons. bamse (talk) 18:38, 3 October 2012 (UTC)
You can think of Commons Archive as a sort of extra Archive namespace for Commons: upload source files there, point them to the main file on Commons, and link back from the Commons file to the Commons Archive source file(s). Rd232 (talk) 16:35, 4 October 2012 (UTC)
Thanks for the explanation. So this basically extends my range of file types, but unfortunately still leaves me with lots of manual linking work, right? bamse (talk) 20:47, 4 October 2012 (UTC)
Yes, though it's not really more work than it would be if Commons allowed the extra filetypes without handling them properly (you'd need to link the non-handled files with "normal" file versions). The extra work really only comes if Commons one day accepts a new filetype; but that can be handled by a bot transferring files from Commons Archive. Rd232 (talk) 21:38, 4 October 2012 (UTC)
BTW gnuplot code is currently just put on file description pages as text - see {{Created with Gnuplot}}. Rd232 (talk) 16:37, 4 October 2012 (UTC)
I don’t see how this may be a problem. Source code (be it Gnuplot, LaTeX, Matlab, whatever) is text. Why would we want to store it somewere else than on the file description page? Jean-Fred (talk) 17:03, 4 October 2012 (UTC)
We'd want to store it elsewhere if it was getting very long. But for the sort of thing we're talking about, that's probably not an issue. I'm not sure if there's any other reason to have it as a separate file. Rd232 (talk) 17:26, 4 October 2012 (UTC)
Just one more thought on this. Perhaps some external tool (thinking of something like the move-to-commons helper at the moment) could be created which:
  • uploads main file to wikimedia commons
  • uploads supporting files to commons archive
  • creates links to commons archive files in the file description of the main file (on wikimedia commons)
This would allow upload of arbitrary supporting file types and would not require manual linking (i.e. less work) by the uploader. bamse (talk) 11:54, 9 October 2012 (UTC)

GFDL

There is a discussion at Commons_talk:Featured_picture_candidates#Proposal:_Change_to_FP_criteria_for_new_nominations:_disallow_.22GFDL_1.2_only.22_and_.22GFDL_1.2_and_an_NC-only_license.22. -- Jkadavoor (Jee) (talk) 07:36, 9 October 2012 (UTC)

Deprecating software licenses for images

Since at least early 2009, and in similar form since at least 2006, Commons:Licensing has said (at Commons:L#Well-known_licenses)

The GFDL is not practical for photos and short texts, especially for printed media, because it requires that they be published along with the full text of the license. Thus, it is preferable to publish the work with a dual license, adding to the GFDL a license that permits use of the photo or text easily; a Creative Commons license, for example. Also, do not use the GPL and LGPL licenses as the only license for your own works if it can be avoided, as they are not really suitable for anything but software.

Following some recent discussions (here), there is some support for the idea of banning new uploads from using these licenses. I present some variations of how this can be done, using GFDL as a short-hand for all full-text licenses. Please remember this applies only to new uploads, and that in all scenarios dual-licensing with GFDL and CC-BY-SA remains the standard. Note: we might want to consider exceptions for cases where images come from external sources or are derived from software (screenshots). Rd232 (talk) 12:19, 9 October 2012 (UTC)

PS Commons:License Migration Task Force may be considered background reading for the long-term move away from GFDL. Rd232 (talk) 13:07, 9 October 2012 (UTC)

Scenario 1a: Ban GFDL-only uploads (except for software-related works)

  • New uploads may not use GFDL/GPL/etc full-text-copy-required licenses as their sole license.
    • Exception: uploads which are derived from software or software documentation (where such licenses are the norm)
  • Dual-licensing with any other license(s) is acceptable.
  • Symbol support vote.svg Support Licenses designed for software should only be used for media when absolutely necessary: dual-licensing media with such licenses is harmless, but single-licensing is bad. As far as I can see, "absolutely necessary" means media derived from software and software documentation, where such licenses are the norm. If other exceptions are brought forward I'm happy to consider those, but for me the basic principle is to avoid single-licensing with such licenses if at all possible. This is precisely why we had the whole Commons:License Migration Task Force business, and frankly it's a bit bizarre that single-licensing wasn't restricted for new uploads (as much as possible) after that happened. Rd232 (talk) 13:36, 9 October 2012 (UTC)
  • Symbol support vote.svg Support I support this approach to complete the transition begun in the licensing update: to license works that are not software under licenses that are better suited to the Commons. For software these licenses obviously still make sense so allowing uploads derived from software under those licenses is proper. Hekerui (talk) 15:24, 9 October 2012 (UTC)
  • Symbol oppose vote.svg Oppose This only gets part right because dual licencing with CC BY-NC-SA is no improvement as far as commercial reuse is concerned. No "acceptable" licence should "require" dual or multi-licencing in order for it to be valid. Either the licence is acceptable on its own, or it is merely a supplementary licence (like CC BY-NC-SA) that users are free to add to an acceptable licence. Colin (talk) 18:30, 9 October 2012 (UTC)
    • In this scenario these types of licenses are not acceptable on their own except for certain limited cases; this is really quite simple, but you seem to have a talent for making it really complicated. And frankly the fact this scenario doesn't solve your NC problem is not in itself a reason to oppose; it is at least a step in that direction, and certainly does not preclude addressing it. Rd232 (talk) 19:59, 9 October 2012 (UTC)
  • Pictogram voting comment.svg Comment The GFDL might also be suitable for books independently of software. But new books are not uploaded to Commons, and might not be in scope anyway. Yann (talk) 05:18, 10 October 2012 (UTC)
    • GFDL is suitable for books: it's used extensively by people like en:VDM Publishing who republish Wikipedia as books... As I said above, I'm willing to add other exceptions than software or software documentation, but it needs exceptions we actually want to use. One area to think about is the source materials hosted for Wikisource, which may be documents rather than media. Is it plausible that these use GFDL as their sole license, if they're not related to software? If it is, we can just add another exception for "primarily textual works" or something like that. Rd232 (talk) 07:56, 10 October 2012 (UTC)
      • Books can certainly be uploaded to Commons and be in scope - example. That book is PD, but there's no reason why a recent freely-licensed book couldn't be uploaded. A list of exceptions "the GFDL is not OK, except for software, books...." is not ideal. Maybe something on the lines of "the GFDL is not OK for original media, but all other GFDL content (including media derived from GFDL works) is OK".
  • To clarify above comment, I Symbol oppose vote.svg Oppose on principle a "ban on GFDL-only uploads (except for software-related works)". I would Symbol support vote.svg Support a "ban of GFDL-only uploads of original media". This is because it bans the images that are a concern, and nothing else. A blanket ban covers everything, and requires a list of exceptions to be workable. Those exceptions will be more complex than the policy. Banning only the files that are a problem removes the need those exceptions.--Nilfanion (talk) 12:07, 10 October 2012 (UTC)
    • That's not necessarily simpler - it may just displace the complexity onto defining "original media". If the list of exceptions is short (so far we have one, and I've suggested one more), then the exceptions approach is clearer than yours, which depends on a term that isn't well-defined. Rd232 (talk) 14:19, 10 October 2012 (UTC)
      • Agree with Rd232 that this is no simpler. If one defines "original" as not a "derivative work" then one only needs to publish as GFDL elsewhere and ta da! you can generate a derivative work you can upload to Commons and escape the ban. But ultimately this proposal isn't going anywhere as it breaks fundamental Commons licencing principles. If one starts with the premise that GFDL is unsuitable/impractical then adding another unsuitable licence (such as CC BY-NC) isn't going to make it acceptable. Every image on Commons needs at least one acceptably free and practically free licence. Colin (talk) 19:40, 10 October 2012 (UTC)
        • Such gaming would be pretty obvious and could be dealt with accordingly. "images, videos and sound files created by the uploader" is less ambiguous and avoids that. This immediately shows one further problem. Say there's another website with good GFDL-only images on it. Can we grab them? Derivatives of GFDL-only content on Commons? Images in a software manual? Images in a fictional book? Are we sure that there are no other exceptions? Or for that matter, promotional photos from a press release for a game? "Software-related" is a vague-term itself  ;)--Nilfanion (talk) 21:29, 10 October 2012 (UTC)
  • Symbol oppose vote.svg Oppose Per Colin. "Dual-licensing with any other license(s) is acceptable" is problematic, because it allows, for example, GFDL and CC-BY-NC. Individually unsuitable licenses do not become suitable in combination. This would be better if worded as "GFDL may be used as a secondary license provided that a suitable license is given also" or something like that. cmadler (talk) 12:29, 18 October 2012 (UTC)
  • Symbol oppose vote.svg Oppose per COM:SCOPE#File in use in another Wikimedia project: Commons also exists as a media repository for other Wikimedia projects. As long as the Syldavian Wikipedia or the Brutopian Wikibooks accepts GFDL files, then Commons needs to accept files from those projects. A ban would not really change anything: you would just have to upload it to Commons using Commonshelper instead. --Stefan4 (talk) 12:57, 18 October 2012 (UTC)
    • COM:SCOPE concerns scope only, not licence or copyright issues. There are loads of files that are on Wikipedia projects that cannot be uploaded to Commons: Fair use images and those with Freedom of Panorama issues are just two examples. Colin (talk) 14:00, 18 October 2012 (UTC)

Self-compiling Creator template

Just to let you know that a test js script can fill an empty Creator template reading and adapting data coming from it.wikipedia. This comes from an AJAX inter-project call for wikitext I presume, it will be not so difficult to edit scripts to let they read en.wikipedia or other wikipedias. For details and WIP see User talk:Jarekt. Is there any other similar project? --Alex_brollo Talk|Contrib 14:52, 4 October 2012 (UTC)

I would love to have a version reading en.Wiki. --Jarekt (talk) 15:58, 4 October 2012 (UTC)
I'll do my best; I'm in debt with you, both for your work about Book and Creator templates and for your personal, kind and patient suggestions too.
I suppose that my code has to be deeply reviewed since I'm a layman programmer but the bold, rough idea runs and I presume that a good js programmer would catch the idea and develop it into a good tool. --Alex_brollo Talk|Contrib 16:20, 4 October 2012 (UTC)
Getting data from en.wiki Infobox template family turned out pretty simple, luckily I wrote generalized algorithms for basic text managing. Please a suggestion: where can I post needed documentation about js tools and their use here into Commons, considering that they are presently WIP? And - can I add a link to that doc page into Creator template documentation? --Alex_brollo Talk|Contrib 08:39, 5 October 2012 (UTC)
You could start putting this in your user namespace and someone with admin privileges will discuss/review it and then maybe move to MediaWiki-namespace or making a gadget from. I hope that, when WikiData is ready to use, we can simply use their database and format the template the way we like. You can also start with the documentation in your user namespace. -- Rillke(q?) 10:36, 5 October 2012 (UTC)
Thanks. Using Gadgets style, I'll simply write a User:Alex brollo/Library page matching main scripts collection User:Alex brollo/Library.js.
About wikidata: I presume that Wikidata will need to be fed with good data, so any effort to make Book and Creator "perfect" is to be considered a real step in Wikidata future development. My present aim is exactly to merge best data about authors and books into well structured and unique "data containers" and Creator and Book are excellent candidates IMHO. --Alex_brollo Talk|Contrib 11:06, 5 October 2012 (UTC)
Thanks Rillke for reviewing js code! An impressive list of comments... I presume, you found the complete repertoire of mistakes of js beginners :-)
Much work is needed to convert scripts in user-friendly and sysop-friendly ones ;-) . --Alex_brollo Talk|Contrib 14:06, 8 October 2012 (UTC)

┌─────────────────────────────────┘
Current version of this tool works great, ... for some creators. Just to recap, the tool allows semi-automatic creation of creator pages based on metadata found at en and it wikis. At this stage I see it as a proof-of-concept effort, that proved that concept is sound. However it needs future development, which as Alex stated somewhere might be a "project above [his] skills/time". May be we should move it out of user namespace to MediaWiki-namespace and make it a more collaborative project. I think this tool is very useful and very needed as it speeds up a rather time consuming process. --Jarekt (talk) 14:56, 9 October 2012 (UTC)

= Active table

I'm happy to let you know that a ActiveTab() js routine, launched into any page with a structured template lke Creator, Book, Information here or Infobox into pedia projects, converts template code into a form which can be comfortaby edited. Sets of homologous data coming from other sources could be loaded into that Active table, allowing a very effective and intuitive comparison of data and use of them to edit them. I'm testing different strategies to collect data from external sources and to load them into such Active table, as an alternative to idea previously commented. --Alex_brollo Talk|Contrib 06:49, 27 October 2012 (UTC)

Prevent end-runs around Licensing Policy ban on Non-Commercial-Only licensing