Commons:Village pump/Proposals

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Community portal
Help deskVillage pump
Administrators' noticeboard
vandalismuser problemsblocks and protections

Shortcut: COM:VP/P · COM:VPP

Welcome to the Village pump proposals section

This page is used for proposals relating to the operations, technical issues, and policies of Wikimedia Commons; it is distinguished from the main Village pump, which handles community-wide discussion of all kinds. The page may also be used to advertise significant discussions taking place elsewhere, such as on the talk page of a Commons policy. Recent sections with no replies for 30 days and sections tagged with {{section resolved|1=~~~~}} may be archived; for old discussions, see the archives.


Please note
  • One of Wikimedia Commons’ basic principles is: "Only free content is allowed." Please do not ask why unfree material is not allowed on Wikimedia Commons or suggest that allowing it would be a good thing.
  • Have you read the FAQ?


Proposal to run a bot to archive every external link using the Internet Archive on Wikimedia Commons[edit]

(Prior discussion Commons:Bots/Work requests#Internet Archive preservation of external links.)

The Wayback machine already works on most major Wikimedia websites.

Dear fellow contributors,

I am proposing to let a bot run on every file on Wikimedia Commons and other relevant pages which utilise external links and archive these links using the Internet Archive for future reference in the same way it is currently done on many other Wikimedia websites. This will allow for license reviewers and re-users to have a point of reference files from external sources as linkrot may obfuscate their original licenses and make it harder to verify them.

For a good (current) example where a changed source page is affecting the license of formerly free files please see "User:Alexis Jazz/DWDD archief". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:13, 5 February 2019 (UTC)

Votes (archiving external links)[edit]

  1. Symbol support vote.svg Support, obviously as the proposing agent. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:13, 5 February 2019 (UTC)
  2. Symbol support vote.svg Support This seems useful. --Yann (talk) 11:39, 5 February 2019 (UTC)
  3. Symbol support vote.svg Support Good idea. - Alexis Jazz ping plz 11:54, 5 February 2019 (UTC)
  4. Symbol support vote.svg Support, I hope they can handle the traffic.   — Jeff G. please ping or talk to me 12:27, 5 February 2019 (UTC)
  5. Symbol support vote.svg Support - Sounds like a great idea!, Although somewhat unrelated I run this tool all the time at EN (which can replace all dead and alive links with WebArchive) - As noted above given licences can and do change I would support this little gem. –Davey2010Talk 20:34, 5 February 2019 (UTC)
  6. Symbol support vote.svg Support. Archive should be done within minutes. This is also useful for Iranian websites which publish content, but occasionally remove them within hours (sometimes at the behest of "censorship office"). For example see File:Pir Shalyar 20190202 06.jpg which no longer can be license-reviewed. Neither Google cache [1] nor Bing cache [2] nor Internet Archive [3] could save the work in time. File:Mahnaz Afshar 20190201 01.jpg is another example which was fortunately saved using Google cache. In this case the problem was apparently violation of dress code. 4nn1l2 (talk) 08:43, 6 February 2019 (UTC)
  7. Symbol support vote.svg Support Common sense idea. This also will help prevent DRs and "no source" tagging. Abzeronow (talk) 14:52, 6 February 2019 (UTC)
  8. Symbol support vote.svg Support This consensus helps to ensure that later housekeeping or bot maintainers can more easily handle complaints, related to what is likely to affect millions of files. Where there are specialized issues, such as "hot" websites where the quoted source is at risk of being taken down, these may need bot tasks negotiated that periodically rerun. For very large stable collections, like Geograph or the British Library, these can run relatively slowly as background maintenance, and it hardly matters whether a new upload waits to have its links added to WBM for a few months. -- (talk) 12:03, 9 February 2019 (UTC)
  9. Symbol strong support vote.svg Strong support yes please. --Jarekt (talk) 12:59, 12 February 2019 (UTC)
  10. Symbol support vote.svg Support and for robots sites [4] go to -- Slowking4 § Sander.v.Ginkel's revenge 14:12, 12 February 2019 (UTC)
  11. Symbol support vote.svg Support This would be a good prevention of linkrot. De728631 (talk) 20:53, 12 February 2019 (UTC)
  12. Symbol support vote.svg Support Platonides (talk) 23:59, 17 February 2019 (UTC)
  13. Symbol support vote.svg Support Blue Elf (talk) 23:10, 22 February 2019 (UTC)
  14. Symbol support vote.svg Support Bj.schoenmakers I'm already using this to preserve copyright information on sites where people can adjust their own copyright on images. My upload-bot will post the url to waybackmachine/ first and use the returned date in my template in the commons upload: for example {{Archive.orgTimeStamp|20190303145847|}} —Preceding comment was added at 00:10, 4 March 2019 (UTC)

Discussion (archiving external links)[edit]

How should this best be implemented? Is the page "User:Fæ/Wayback" developed by a good model? Personally I propose "[EXTERNAL LINK] (ARCHIVE, retrieved: DD-MM-YYYY)". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:13, 5 February 2019 (UTC)

@Donald Trung: "{{Wayback|url=http%3A//|date=20150316101047}}" (implemented as "archive copy at the Wayback Machine (archived on 16 March 2015)" on File:143, Sverige, Stockholm, Roslagsbanans depå (Trainpix 122696).jpg) is standardized and looks nicer, you can discuss on Template talk:Wayback if you disagree.   — Jeff G. please ping or talk to me 12:38, 5 February 2019 (UTC)
@Jeff G.: indeed, that looks way better, and having a standard template for Internet Archive Wayback Machine links would also make it easier to be consistent. Face-smile.svg I honestly wasn't aware of the existence of "{{Wayback}}", this would make implementing the above proposal easier as well. Face-grin.svg --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) ill have (Articles 📚) 12:48, 5 February 2019 (UTC)
Though some earlier wayback additions were the links only, and others like Fortepan have the WBM link added as part of a specialized collection template, the largest collection so far, the Portable Antiquities Scheme uploads are using the preexisting wayback template. See File:BUCKLE_(FindID_187883).jpg or File:Cavalry Soldiers rehearse live-fire exercises with Lithuanian partners 141118-A-QS211-838.jpg for examples of how this looks. -- (talk) 11:57, 9 February 2019 (UTC)

I do not understand the proposal. Are we voting on something that will be done on the Wayback-homepage? --Schlurcher (talk) 12:47, 10 February 2019 (UTC)

@Schlurcher:, this proposal is so that all external links could be backed up using the Wayback Machine using a bot, this would create a snapshot of the external website which future people could use to confirm the licenses of files. For example I import a photograph from (example website) but then this website disappears a year later, a license reviewer then tries to confirm the license but can't, now this image will have to be deleted because its free license can’t be confirmed (see “COM:PCP”), now if this external website was backed up using the Internet Archive this file would not have to be deleted. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:04, 10 February 2019 (UTC)
@Donald Trung: or you could use some examples that actually happened: Commons:Village pump#License reviewers and admins help is needed ASAP (we got lucky with that one and everything could be reviewed in time), Category:Images from and Category:Photographs by Agencia Brasil. - Alexis Jazz ping plz 21:36, 10 February 2019 (UTC)

Speedy revision deletion tag for overwritten files[edit]


{{Overwritten revdel}} and Category:Overwritten files requiring revision deletion were created. LX (talk, contribs) 17:56, 24 February 2019 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

There is currently no standard way to ask administrators to delete a specific revision of a file (revision deletion) that's been uploaded in violation of Commons:Overwriting existing files. Such revisions are typically copyright violations (and even if they're the overwriter's own work, there is normally no licensing statement for the specific revision), so history splitting is hardly ever a viable option.

One way to request deletion is to write a message on the administrators' noticeboard. That gets the job done, but starting a section on a discussion board that's on a lot of people's watchlists for routine housekeeping isn't exactly a streamlined process. Starting a deletion discussion is also possible, but is even more process heavy and too slow to be fit for this purpose. Another way to do it is {{speedy}} with a custom message, but with overworked administrators, that can lead to more than just individual revisions being deleted. It would be better to have a more specific tag.

I believe that we should have a tag similar to {{Non-free frame revdel}} for this purpose. We could call it something like {{Overwritten revdel}}. It should put files into a separate subcategory of Category:Candidates for revision deletion, e.g. Category:Overwritten files requiring revision deletion. The tag should be limited to cases where the revision to be deleted is completely different from the original revision. Furthermore, the revision to be deleted should either be a copyright violation or lack the information necessary to enable history splitting. LX (talk, contribs) 19:29, 5 February 2019 (UTC)

Votes (overwritten revdel)[edit]

Symbol support vote.svg Support as per above discussion. --Yann (talk) 07:58, 6 February 2019 (UTC)

Discussion (overwritten revdel)[edit]

Pictogram voting comment.svg Comment The maintenance category already exists. I even put some files in it half a year ago. We don't need a tag (which is only more cumbersome to remove), someone just needs to watch that category. - Alexis Jazz ping plz 19:36, 5 February 2019 (UTC)
I don't oppose the general idea, but I don't think a tag is needed. Administrators just need to made aware of the category. - Alexis Jazz ping plz 13:19, 6 February 2019 (UTC)
Sometimes you need to explain the reason of revdel. Tags can be useful in this case. 4nn1l2 (talk) 13:28, 6 February 2019 (UTC)
For the cases that need that, I think a message could be left in the edit comment or upload comment. But I won't oppose a tag. - Alexis Jazz ping plz 15:21, 6 February 2019 (UTC)
Comment - I would support this - I've always been surprised there's no sort of policy on this, Me and Alexis have both photoshopped a few images here together (I believe 2?) and only 1 was revdelled which was only because I asked an admin to do it otherwise it wouldn't have been, Anyway I would support this. –Davey2010Talk 20:30, 5 February 2019 (UTC)
By the way, Media with unacceptable data in old versions awaits some reasonable procedure for some 1½ years. Incnis Mrsi (talk) 20:34, 5 February 2019 (UTC)
Fact of the matter is that if it is not listed on Commons:Admin backlog it is never going to be looked at. I doubt many admins even know those categories exist. --Majora (talk) 04:29, 6 February 2019 (UTC)
Pictogram voting comment.svg Comment I would also support this.   — Jeff G. please ping or talk to me 07:15, 6 February 2019 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Do we want to bot-copy descriptions to captions?[edit]

Structured Data on Commons released its first feature last month: media files can now have captions in different languages. Captions are quite close to descriptions, except that they are structured by language. It is technically possible to bot-copy descriptions to captions (e.g., [5], [6] were copied using pywikibot). There is a potential copyright issue here, in that captions are CC-0, which perhaps could be avoided by only copying short descriptions (say, under 200 characters) where they are sufficiently short/simple that they can't be copyrighted (as per WMF legal). Do we want to do that for all files, or are there other concerns that need addressing? Thanks. Mike Peel (talk) 21:15, 8 February 2019 (UTC)

Wait...why...are captions licensed under something different than almost the entire rest of almost every other project? (This might be a stupid question, I'll admit.) GMGtalk 21:20, 8 February 2019 (UTC)
@Keegan (WMF) can probably answer this better than me, but in general I understand it's because facts/short captions can't be copyrighted along the same lines as {{PD-simple}}/{{PD-ineligible}}. It matches what Wikidata uses, and there's some rationale on Wikidata. Thanks. Mike Peel (talk) 21:37, 8 February 2019 (UTC)
Indeed, captions are to be CC0 in order to work with the licensing for the rest of the structured data project, which is based on Wikidata's CC0. Captions are the Commons equivalent of a Wikidata label, meant to be pulled from the API with other data from other structured statements and fields once they are available. More information about why the database is CC0 is available in Mike's link. Keegan (WMF) (talk) 22:36, 8 February 2019 (UTC)
Ah. Stupid question confirmed. I've been poking around WD for a little while now and I honestly hadn't realized it was licensed differently. GMGtalk 23:06, 8 February 2019 (UTC)
Captions are stored on Wikidata and Wikidata is CC0 by design. So they have a different license. --Majora (talk) 21:38, 8 February 2019 (UTC)
Hello Keegan (WMF). I think this statement is misguiding. Nothing legally prevent to mix CC0 data, granted this license was legally applied in the first place, with data covered with any other license. Nothing prevent to have an API that also provide the license under which each label is covered, and nothing prevent to store label with different licenses into the same database. The only possible issue is when someone want to mix content from data covered under incompatible license. As most works of Wikimedia are under CC-by-sa-3.0 unported, this means allowing to provide specific license for each label would end up in no conflict at all for most cases that will be useful for wikimedia community and external partners that are not wanting to avoid reciprocity of rights on derived works. Cheers, Psychoslave (talk) 09:16, 6 March 2019 (UTC)
No. WMF legal have given no guarantee of immunity against claims of damage from breaking moral rights. The idea that 200 characters cannot have a copyright holds no water. -- (talk) 21:34, 8 February 2019 (UTC)
No thank you. I'm not a big fan of this caption system to begin with. I'm not a big fan of storing data where it can't be controlled directly by the community that deals with it. --Majora (talk) 21:38, 8 February 2019 (UTC)
@Majora: The data is held on Commons (in the commons wikibase installation), not on Wikidata. Thanks. Mike Peel (talk) 21:46, 8 February 2019 (UTC)
Ah. Well they should probably make that more clear to people who haven't been following along with its development since I was under the impression it was stored elsewhere. I'm more neutral on this idea then. --Majora (talk) 21:48, 8 February 2019 (UTC)
@Majora: I have added a section in the FAQ answering this − hope that helps: Commons:File_captions#Where_are_captions_stored?. Jean-Fred (talk) 13:55, 11 February 2019 (UTC)
  • I Symbol support vote.svg Support this, short descriptions are everywhere on Wikimedia Commons and as we have 51.000.000+ (fifty-one-million plus) files on Wikimedia Commons without any captions this will do a large part of the work and save a lot of volunteers' time. It's a shame that we can't import all file descriptions but within the current restraints what Mike Peel is suggesting is the best option. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 22:37, 8 February 2019 (UTC)

GA candidate.svg Weak support Only if limited to short descriptions. Some descriptions are quite long and detailed, and there's nothing wrong with that - there is a lot of valuable information available for some images. But the captions have a different purpose as the "Commons equivalent of a Wikidata label", as Keegan wrote, the "one-line explanation of what this file represents", as the caption description says. So, even if not considering possible licensing issues, long descriptions should not be copied to captions. With a rule like "only the first sentence (text before the first full stop) and only if this sentence is not longer than 200 characters", I think it might be a possible approach. Gestumblindi (talk) 23:37, 8 February 2019 (UTC)

I'm just not sure this would work. For example, this caption is comparatively meaningless. This caption is equally so if only the first sentence is provided. GMGtalk 00:52, 9 February 2019 (UTC)
@GreenMeansGo: The first one would need human clean-up (currently in the description, after this proposal both in the description and the caption). The second one is 614 characters long, so the limit of 200 characters I proposed above would mean it wouldn't be copied over. (We could partially copy descriptons, but as you say that probably doesn't help.) Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
  • No. Lots of crap in descriptions, liabled to just reproduce the crap. Arbitrary example: File:Terrible Trail, The Meek Cutoff - Flickr - brewbooks.jpg - Jmabel ! talk 00:58, 9 February 2019 (UTC)
    @Jmabel: So how do we clean the descriptions up? The example you've given is ~730 characters long, so over the 200 character cut-off I was suggesting. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
    I didn't spend a bunch of time searching for a precise example before posting, I was just trying to illustrate the sort of thing I meant, but here:
    The only way to clean this up is for someone to do the hard work of cleaning this up. I do that a lot on photos I think are of historical interest or likely to be used; otherwise, frankly, when dealing with other people's photos I usually stick to cleaning up categories and usually don't go near the descriptions unless they are actively inaccurate. (Would normally have fixed that "Elliot Bay" one, but I'm leaving it right now as part of making my point.) - Jmabel ! talk 16:11, 9 February 2019 (UTC)
  • Pictogram voting comment.svg Comment The idea is not bad in itself, but note that there is a kind of inadequacy between the fact to notice to the uploader about the license of the structured content, and the fact to copy the content that they did not deliberately put in the structured content area. There is something a little hypocritical in that, it is like we say : "be aware that the structure datas are under CCO, but anyway, and without you agreement, we will copy your CC-BY-SA 3.0 contribution to this CC0 field." Christian Ferrer (talk) 05:48, 9 February 2019 (UTC)
+ I have serious doubt that all the descriptions are simple enough to be exempt from copyright protection, example File:Gashnian 20170306 18.jpg, I don't think we can move this text that is originally licensed under BY to CCO. Christian Ferrer (talk) 06:09, 9 February 2019 (UTC)
@Christian Ferrer: That caption is 646 characters long, so the maximum length of 200 characters I suggested at the start would exclude that one from being copied. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
This text could be limited to the first 200 characters, it wouldn't be less copyrightable IMO. Christian Ferrer (talk) 11:18, 9 February 2019 (UTC)
@Christian Ferrer: My suggestion was that if the description is over 200 characters, then it wouldn't be copied at all. I wasn't suggesting only using parts of descriptions. Thanks. Mike Peel (talk) 11:24, 9 February 2019 (UTC)
@Mike Peel: yes I understood, I say only that a 200 long characters text may also have a copyright. Example my own comment that you are currently reading and that is under CC-BY-SA 3.0. Who can say that this comment is CC0? Christian Ferrer (talk) 12:06, 9 February 2019 (UTC)
Though it's true that the persons who are writting descriptions are certainly less creative than I'm trying to be right now.... :) Note that I do not oppose, I simply speak. And it's true that a lot of descriptions are very simple, even too much simple sometimes, lacking of useful infos. Christian Ferrer (talk) 13:54, 9 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose I would like to have a separate description in the structured data, where the old descriptions can be copied. If the Captions should be bot-filled, then with the title of the image. --GPSLeo (talk) 10:05, 9 February 2019 (UTC)
    @GPSLeo: You mean at Commons talk:Structured data? Feel free to start such a discussion. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose A note of nit picking but unfortunate reality here, in follow up to my earlier "no". If I see any of my batch upload projects where this is being done, I plan to mass revert on copyright grounds per COM:L, unless the text is specifically and unambiguously released as CC0 at source. This includes metadata such as titles, descriptions or captions. Anyone populating CC0 captions has the burden of proof to ensure that the text has been correctly released. The claim at the start of this thread that this statement may make copying text of certain lengths into captions okay, is misleading. Keegan (WMF) (talk · contribs) does not write for WMF Legal and does not claim to be a lawyer or a legal academic, so please avoid quoting what they write as if it has legal weight for the WMF, or is a meaningful legal opinion that unpaid volunteers could use to protect themselves from future claims of damages. If anyone wishes the WMF Legal department to publish an opinion that could be taken into a courtroom, then ask them for one in writing. -- (talk) 14:42, 9 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose concerns (raised above). --Steinsplitter (talk) 14:44, 9 February 2019 (UTC)
  • Pictogram-voting-question.svg Question @Mike Peel: is it possible to insert a caption in wikitext? Like {{Information|Description={{Caption:en}} or whatever. - Alexis Jazz ping plz 16:48, 9 February 2019 (UTC)
    • @Alexis Jazz: Not yet, I'm hoping that's coming later this year. @Keegan (WMF) may want to comment on this. Thanks. Mike Peel (talk) 19:07, 9 February 2019 (UTC)
      • @Mike Peel: In that case I oppose as well. Don't want to see information being duplicated. That will only lead to things like this. When captions can be inserted with wikitext, another vote would be welcome. - Alexis Jazz ping plz 19:25, 9 February 2019 (UTC)
      • @Alexis Jazz: So to clarify, you would be happy with moving descriptions to captions, but not copying them? Thanks. Mike Peel (talk) 19:36, 9 February 2019 (UTC)
        • @Mike Peel: To clarify, I completely oppose it when you would be exactly duplicating descriptions. For shortened descriptions this wouldn't apply as those are not exact copies. I'm not sure yet I will be happy with moving descriptions considering the other issues that were brought up in this thread. I don't immediately oppose that though, but as long as the captions can't be inserted using wikitext, I think a proposal is actually premature. - Alexis Jazz ping plz 19:42, 9 February 2019 (UTC)
  • I don't quite think it's appropriate to add to captions while the giant white box is getting in the way of users' access to the file description. First it needs to be hidden, reduced, floated to a side or whatever. Nemo 17:09, 9 February 2019 (UTC)
    • @Nemo bis: Are you suggesting to enable the "Compact Captions" gadget by default? - Alexis Jazz ping plz 17:43, 9 February 2019 (UTC)
      • I'm not a fan of collapsing. Whatever the vertical size, it's harmful if it pushes down the information template/the description while taking the whole width of the page (usually for nothing). Nemo 22:01, 9 February 2019 (UTC)
  • Regretful Symbol oppose vote.svg Oppose. I don't think it's legal. It's true that common phrases and similes are not covered by copyright, as they are seen as the building blocks of the language, capable of being reused and repurposed in many different transformative ways. Also in cases where there's a substantial chance there was coincidental independent creation. But, regettably, neither of those grounds apply here, for a systematic programme of extraction and licence-washing of individuals' contributions that are specific to particular contexts, being taken for the exact same purpose and exact same context; for some contributors cumulatively amounting to tens of thousands or even hundreds of thousands of words. One cannot argue that a taking on such a scale and so systematic is just incidental. If the descriptions are licensed CC-BY-SA one can't just wave that away and claim that they can be reissued CC0. Therefore, regrettably, IMO the existing descriptions cannot be reused unless they have been licensed CC0, or are directly derived from sources that are PD or CC0. Jheald (talk) 22:26, 9 February 2019 (UTC)
@Jheald: You entered 5 tildes instead of 4. - Alexis Jazz ping plz 22:32, 9 February 2019 (UTC)
Thx. Fixed. Jheald (talk) 22:42, 9 February 2019 (UTC)
  • Mike Peel, I'm confused how the captions can be CC0 if created automatically from a non-CC0 licensed work or a public domain work. The CC 0 legal code requires the "affirmer" to "voluntarily elect" to apply CC0 terms to "his or her Copyright and Related Rights in the Work". If the work is already in the public domain (for example, deemed not eligible for copyright) then there is nothing for which CC0 applies and the PD mark would be more appropriate. If one's work is under copyright, nobody else can put the work under CC0 terms on your behalf. -- Colin (talk) 12:35, 10 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose I'm too worried about the copyright. Even for short descriptions. A single tweet would rarely be copyrightable. But if you collect thousands of tweets from one person, it becomes another story. @Mike Peel: I also find it very worrying that when users enter captions now, they are not informed about the CC0. This means all the captions that have been entered so far are licensed as BY-SA 3.0, not CC0. - Alexis Jazz ping plz 17:24, 10 February 2019 (UTC)
  • Symbol support vote.svg Support However lets do it after captions are properly marked as CC0. I would also start with short captions with clearly identifiable language. Short text (less than 5-10) words likely falls under {{PD-text}} and I do not trust templates like {{en}} to be correct. --Jarekt (talk) 13:35, 12 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose the license violation Symbol support vote.svg Support the use of structured data Symbol support vote.svg Support the integration of the license in the structured data. That way we could import any free license content there, with fine grained license conformity. Just as Commons already do with media. This thread seem to pertain to the broader issue "Address concerns about perceived legal uncertainty of Wikidata". I'll add a permanent link to this thread in the issue too. --Psychoslave (talk) 09:03, 6 March 2019 (UTC)
  • Symbol oppose vote.svg Oppose. Descriptions could be long. Captions should be short for several reasons. --Andrew Krizhanovsky (talk) 13:23, 6 March 2019 (UTC)

Sample set[edit]

@Mike Peel: Can I suggest processing a batch of, say, 1,000 randomy-chosen images, and writing the proposed captions to a gallery page(s) in user space? Then we can all review them, and look for anti-patterns to avoid. You'll need to strip wikicode, for instance - or skip anything that uses it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:15, 9 February 2019 (UTC)

@Pigsonthewing: See User:Mike Peel/Captions for some examples. They are randomly selected, and the code only looks for labels marked with {{en}} and doesn't strip out any wikicode or HTML yet (that work would be done ahead of a bot proposal). Thanks. Mike Peel (talk) 19:55, 9 February 2019 (UTC)
Good idea, interesting. Almost should be ok, indeed, but a few cases are more debatable such as File:Distant County Hall - - 896433.jpg and the other files coming from Geograph project. @Clindberg: hello, have an opinion about the fact to apply a CC0 license to descriptions not under CC0 at their origin? Christian Ferrer (talk) 20:42, 9 February 2019 (UTC)
@Christian Ferrer: I don't think there is any way to safely apply CC0 to licensed description text. Short phrases are not copyrightable, but entire sentences could be, depending on the wording. If it's just bare factual information like date and place and subject there is probably no copyright, but I would think it would be possible for some descriptions (even short ones) to be copyrightable. Carl Lindberg (talk) 21:28, 10 February 2019 (UTC)
@Mike Peel: Thank you. Most look good. On images like File:Ishiguro Koreyoshi - Kozuka with Chrysanthemums - Walters 5112783.jpg and File:Zoophytes- 1. 2. Fongie Actinie. (Nouvelle-Irlande.); 3. 4. Fongie à gros tubrcules. (Vanikoro.); 5.- 9. Tubinolie rouge. (Nouv-Zélande.) (NYPL b13624459-1267199).jpg, the |title= should be captured, before, or instead of the description. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:51, 9 February 2019 (UTC)
It's probably worth skipping audio files, too. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:33, 9 February 2019 (UTC)
There are some line like ''Abstract/medium:'' 1 negative : glass ; 5 x 7 in. or smaller , with no filename. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:51, 9 February 2019 (UTC)
File:Beach handball at the 2018 Summer Youth Olympics – Girls Preliminary Round – RUS-ASA 29 (cropped 2).jpg didn't go well either. (caption: "lang=en") - Alexis Jazz ping plz 21:04, 9 February 2019 (UTC)
@Pigsonthewing, Alexis Jazz: Yes, there are improvements that can be made here. This was code that I wrote in 5 minutes, and as I said it just checks for {{en}} and the character limit I suggested at the start of this discussion. You can see the code at [7]. I can improve it if needed to do what you're suggesting and more - but I am only going to do that for code that I can then actually use to make edits here. Thanks. Mike Peel (talk) 21:45, 9 February 2019 (UTC)
@Mike Peel: I'd say: work on the caption-wikitext-inclusion thing first. There was opposition against captions in general from the start because nobody wants duplicated data and having to fix typos in two places. You have to learn how to walk before you can run. - Alexis Jazz ping plz 21:54, 9 February 2019 (UTC)
@Alexis Jazz: It's up to the structured data on commons team to sort out the ParserFunction/Lua access to the captions, I can't do anything to help with that. On the other hand, we have captions now, so we can start to use them, and that's a good step forward. Copying the description to the captions is a start, then we can figure out including them in {{Information}}, {{Wikidata Infobox}}, and elsewhere, as the next step. Thanks. Mike Peel (talk) 22:06, 9 February 2019 (UTC)
@Mike Peel: I disagree about the order. Get caption-wikitext-inclusion working first so duplicate descriptions/captions can be avoided. If you were to start copying now, it'll just create a mess when caption-wikitext-inclusion becomes available but the descriptions/captions are no longer in sync because users have edited one but not the other. - Alexis Jazz ping plz 22:22, 9 February 2019 (UTC)
Do you have any evidence that doing that is planned? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:32, 9 February 2019 (UTC)
Users edit descriptions all the time. You're saying they will stop doing that if descriptions are copied to captions to avoid them becoming desynchronized? - Alexis Jazz ping plz 17:20, 10 February 2019 (UTC)
No; I'm asking you what evidence you have, that "Get caption-wikitext-inclusion working " is planned. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:51, 13 February 2019 (UTC)
  • Within the example set are seven files from the Portable Antiquities Scheme, we are current approaching 500,000 files from there. All the metadata is CC-BY-3.0, and so none may be reused as CC0. -- (talk) 22:43, 9 February 2019 (UTC)
  • Within the set are nine files from the Fleuron project, the batch made for an interesting image processing experiment of 250,000 files. The metadata relies on a Gale database, where the database rights are reserved. Clearly, systematically extracting any part of data and republishing as CC0 would break the expectation of limiting reuse, effectively by creating a new CC0 database with no attribution being preserved back to Gale. -- (talk) 22:57, 9 February 2019 (UTC)
  • Within the set are a significant number of files from Flickr, where the sources are not released as CC0. The licenses at source, such as the frequent default of CC-BY-2.0, must be presumed to apply to the metadata including the given titles and descriptions. Recasting these as CC0 cannot be supported as being compliant with their original releases. -- (talk) 23:10, 9 February 2019 (UTC)
    • I can't reply to any of these as the comments are too vague to let me investigate them, plus I am not a copyright lawyer. Mike Peel (talk) 23:16, 9 February 2019 (UTC)
      • Not expecting a solution, raising as very large collections which illustrate why auto population of CC0 captions is a non starter. Something that seemed obvious to me when captions were rolled/driven out without copyright issues being properly discussed. -- (talk) 23:28, 9 February 2019 (UTC)
      • "Not expecting a solution" - so that was entirely pointless then. Mike Peel (talk) 23:33, 9 February 2019 (UTC)
        • "illustrate" does not equal "pointless" unless you are unable to let evidence change your viewpoint. -- (talk) 10:46, 10 February 2019 (UTC)
    • I remain to be convinced that a simple, factual description like "A silver hammered penny of Edward the Confessor, minted in Southwark between 1042 and 1044. Moneyer: Wulfwine." can be copyrighted. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:30, 9 February 2019 (UTC)
      • Looking at its source page at the BM, I suspect that such a caption reflects exactly the sort of knowledge and judgment and choice of expression that represent scholarship that can be protected by copyright, especially if the proposition is to similarly take 10,000 more such descriptions.
        None of the information given "of Edward the Confessor", "minted at Southwark", "between 1042 and 1044", "moneyer: Wulfwine" is obvious just from looking at the image. Even the term "silver hammered penny" represents a judgment, and a choice of description.
        I think you are on quite treacherous ground, to assert that there is no copyright here. Jheald (talk) 00:20, 10 February 2019 (UTC)
  • Pictogram voting comment.svg Comment Look at Text & Data Mining; This is not exactly the same thing but somewhat a bit similar topic in the extend that we talk about structured datas. Christian Ferrer (talk) 06:00, 10 February 2019 (UTC)
  • Note that some captions have already been manually copied from the descriptions by users who are not the descriptions authors. Christian Ferrer (talk) 07:52, 10 February 2019 (UTC)
    • Yeah, this is a good point Christian and gets at the deeper issue here. Even if a bot wouldn't be right for this task, we could easily semi-automate it to allow for rapid-fire manual checks for meaninglessness or other problems. But the real issue is that even if we do neither, it is deeply intuitive to simply copy/paste the existing descriptions, and will likely be done on a large scale, even if it is done piecemeal and manually across the entire project.
      Now I don't claim to be the most experienced user in the history of Commons, but I've been generally aware and even slightly involved in the ongoing discussion about structured data. I had no idea whatsoever that captions were licensed differently than descriptions until this discussion. The reason for that is probably that there is no indication whatsoever either in the upload wizard or on the file pages that these are licensed differently. That's a problem, and on such a copyright savvy project as Commons, it's a little surprising that we've implemented a system where users are shadow-licensing their content with no notification or explanation. The implication of that is that these contributions aren't actually licensed under CC0, because no notification means no license. GMGtalk 13:50, 10 February 2019 (UTC)
      • "WE" have not implemented this. We, the community, have not agreed anything about captions. The rationale that some discussion on a Phabricator task can replace a Commons community consensus is bizarre and simply a convenient fiction to justify a WMF desired change. The problem with literally mass ripping metadata from Wikimedia Commons and pretending that it has no copyright, has always been a foundational and extremely obvious logical conflict with the structured data proposals. But hell, who am I, I just have opinions based on years of creating content on this project, but as I've never been paid for it, my voice can be safely marginalized. -- (talk) 13:59, 10 February 2019 (UTC)
        • Well, even if we didn't implement it, we need to fix it, because without any type of notification whatsoever, the entire enterprise is basically just copyfraud. I mean, it's possible that I'm missing something obvious here, but it seems pretty straightforward. GMGtalk 14:10, 10 February 2019 (UTC)
          • Kind of missing the point. We, the community, do not own it. We do not get to say how it works. It is not ours to "fix", so why try? Frankly apart from being annoyed, I have been given zero incentive to care about this change, or to help with using this badly thought out and badly implemented "feature". The single option I have been given is to hide it from my view rather than remove it until it might be acceptable. Somehow that has been politically spun as being positive.
            Consequently, maybe we need a legal case, or a "WMF copyfraud" bad PR incident, to get the WMF to care when we ask for a change, or we politely suggest that the WMF properly tests major changes before rolling them out on what they think is "their" project. -- (talk) 14:30, 10 February 2019 (UTC)
  • Note that we can also try to go further by steps, there is mainly two types of content here : the "own works" and the content coming from external sources, you can begin with the "own works" :
1/ send a mass message to all the "own works" uploaders (or to all users), and notice to them that we are starting to copy the descriptions to the captions for the files tagged as "own work", and that there is a license change for the text coming from the description, and that they can object if they wish, and then you will proceed to an announced date
2/ or proceed all "own works" without sending any messages, assuming that there are normally no copyright infringements in the "own works" descriptions
3/ create maintenance categories such as Category:Media with captions or/and Category:Media without captions or/and Category:Media from external sources without captions, ect, ect...

That is just ideas, I don't know if it is the good way. Christian Ferrer (talk) 18:08, 10 February 2019 (UTC)

Fix the captions licensing[edit]

As we have seen in the proposal above, file captions are licensed as CC0. However, users who enter captions are currently not informed about this. At all. They also copy descriptions to captions, but descriptions often have a different license. In general, we can assume all captions that have been entered so far by users who are familiar with Commons are licensed the same as the proposal I'm writing now and all wikitext here: Creative Commons BY-SA 3.0. (lawyers may want to debate this, but for regular Commons users this is nitpicking) Captions that were entered by users with few edits, sadly no.

The Commons community in general has limited leverage over these development decisions, but we do have some. So here is a proposal, without hurting the developers too much, as they are given a choice. The proposal is this: one of the following should be done:

1. Delete all file captions that have been entered so far from the database. Inform the user clearly that the captions they enter will be released as CC0, similar to the message above the "Publish changes" button when editing wikitext. Also, create a system that will prevent users from mass copy-pasting file descriptions to file captions. Considering the license difference, they will generally need to rewrite the description in their own words for the file caption.. at least.


2. Delete all file captions from IP-users and users with few edits, as they may not be sufficiently aware of wikitext licensing. Change the license for the remaining file captions and future captions to Creative Commons BY-SA 3.0.


3. The developers disable captions for now, have WMF legal actually look at the whole thing and enable it again with permission from legal in whatever form legal deems appropriate.

So they have three options. A complicated one if they must have CC0, a more simple one if they switch to BY-SA 3.0 or they battle it out with legal. This vote is not for which option the developers should pick. This proposal is merely saying the developers have to pick one of them. - Alexis Jazz ping plz 22:48, 10 February 2019 (UTC)

Voting (fix the captions licensing)[edit]

  • Symbol support vote.svg Support, obviously. - Alexis Jazz ping plz 22:48, 10 February 2019 (UTC)
  • Symbol support vote.svg Support.   — Jeff G. please ping or talk to me 22:59, 10 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose I don't think we should be deleting captions because of this issue. Option 2 if we had to do that sounds the least disruptive though. Honestly, WMF is not going to stop captions, that would be a waste of thousands of dollars to them. Abzeronow (talk) 23:16, 10 February 2019 (UTC)
  • Symbol support vote.svg Support, I was not aware of the different license. This is an issue. --Schlurcher (talk) 09:09, 11 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose all three options. Go discuss things at Commons_talk:Structured_data#CC0_licensing_mockups first. Mike Peel (talk) 12:54, 11 February 2019 (UTC)
  • Symbol oppose vote.svg Option 1, Symbol oppose vote.svg Option 2, Symbol support vote.svg Option 3, note that I am against any form of deletion, only if the legal department of the Wikimedia Foundation believes it to be necessary could I (very, very reluctantly) support it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 20:38, 11 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose There are few things that are more annoying than deletion of your work. If I do a sloppy job researching some file license and it gets deleted than I am the only person to blame. However if I was adding many captions and someone deleted all my work because I was not properly notified that my edits were CC0, I would be very pissed. I think lack of proper notification at the rollout of a new product is much less of an issue than deletion of someone's work. That said I hope proper CC0 marking of captions is added soon. --Jarekt (talk) 13:23, 12 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose not broken do not fix. Slowking4 § Sander.v.Ginkel's revenge 14:07, 12 February 2019 (UTC)
  • Symbol support vote.svg Support Of course it's broken. To deny otherwise is equivalent to closing your eyes and going "La la la, copyright can't touch me, la la la". One thing Brexit should have taught everyone by now, is that asking everyone to "believe harder" is not a good way to handle reality. -- (talk) 14:37, 12 February 2019 (UTC)
  • Symbol support vote.svg Support This project is a disgrace. Jürgen Eissink (talk) 02:10, 14 February 2019 (UTC).
  • Symbol support vote.svg Support We can't misrepresent such an important issue to our fellow contributors. --Psychoslave (talk) 09:37, 6 March 2019 (UTC)
  • Symbol support vote.svg Support Option 3. This is a legal issue that folks are unintentionally violating. It needs to be more carefully handled than the Foundation simply throwing it into the wild and saying "have fun!" Huntster (t @ c) 19:26, 6 March 2019 (UTC)
  • Symbol oppose vote.svg Oppose Captions are supposed to be simple one line descriptions. If they are, they won't be original and creative enough to be copyrightable. If they are copyrightable, they should be edited down to be simpler. There's no reason to mass delete them. Kaldari (talk) 05:09, 8 March 2019 (UTC)
@Kaldari: Which country's TOO should we follow, then?   — Jeff G. please ping or talk to me 12:56, 8 March 2019 (UTC)
Please write a help page explaining to a novice user precisely how to tell whether a text they are copying in to the field has potential copyright or not, and whether they are at risk of a legal claim of damages, especially if all they did was copy a line from an on-wiki CC-BY licensed description. Vague hand waiving of probably short enough text is probably not copyrightable, not sure because the WMF lawyers will not give me a statement they would be prepared to go to court on... is not an adequate answer. Your expertise here would be super, rather than relying on a majority vote of unpaid volunteers, as if that can override basic copyright law. Thanks. -- (talk) 17:11, 8 March 2019 (UTC)

Discussion (fix the captions licensing)[edit]

  • Pictogram voting comment.svg Question Wouldn't it just be easier to modify one of the interface displays, say MediaWiki:Wikibasemediainfo-entitytermsforlanguagelistview-caption, to read Captions (Note: All captions written in this box are released under the Creative Commons CC0 1.0 Public Domain Dedication). That way what is happening is clear to everyone who sees that box? --Majora (talk) 22:54, 10 February 2019 (UTC)
    Well I boldly tried it. Unfortunately interface displays apparently can't use wikimarkup. So the small tags actually displayed and the external link did not format. Still a possibility to fiddle with one of the interface messages to make sure people know what they are doing when they use the captions box. --Majora (talk) 23:03, 10 February 2019 (UTC)
    @Majora: I'm not sure how that would look, but perhaps that could be part of the solution. But that message would also need to be translated in many languages. And it would still leave us with the captions that have already been entered. And I'm guessing some users will ignore the message and copy descriptions anyway, so something should be put in place to prevent that. Informing users about the captions license will be needed, whichever path is chosen. - Alexis Jazz ping plz 23:14, 10 February 2019 (UTC)
    Well I'm working on the beta cluster to see if I can get anything that actually looks proper. We can always use the translations of MediaWiki:Wikimedia-copyrightwarning if I can get the actual formatting correct. --Majora (talk) 23:18, 10 February 2019 (UTC)
    Don't think this is going to work, unfortunately. There just isn't enough options to change that would make this doable and the only option that would be viable doesn't want to display URLs in any readable fashion. Oh well. --Majora (talk) 23:32, 10 February 2019 (UTC)
  • @Abzeronow: the developers can pick any of the three options they like. The third option is to take it to legal. If legal says they don't have to delete anything, well, that's fine. - Alexis Jazz ping plz 23:25, 10 February 2019 (UTC)
    • This might be a dumb question, but why not a fourth option to change license of wikitext to CC-0 instead of CC-BY-SA? Abzeronow (talk) 23:30, 10 February 2019 (UTC)
      • We can't retroactively change a non-CC0 license to CC0 without the permission of the copyright holders. That would be voiding their copyright. --Majora (talk) 23:32, 10 February 2019 (UTC)
        • Also it would be quite the feat. Few wikis have changed their license. English Wikinews has (is now CC BY), and Wikidata is obviously CC0. It's not impossible, but it would only be valid for new contributions. Go figure, we haven't even updated to Creative Commons 4.0 yet. Another issue are imported descriptions, like those from Flickr. - Alexis Jazz ping plz 23:44, 10 February 2019 (UTC)
          • Obviously, Commons should live up their name and going forward change to CC-0. I guess some sort of permission and/or fair use rationale will have to suffice in the meantime though. Abzeronow (talk) 17:09, 11 February 2019 (UTC)
Here's an actual alternative to deleting current captions.

Mass-deliver an electronic letter to the talk pages of everyone who created a file caption before the Creative Commons 0 (Zero) license is launched and inform them that they can opt into releasing their file captions with the Creative Commons 0 (Zero) license or otherwise they will be deleted. This opt in system could also work for Mike Peel's proposal above for {{Own}} files. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:55, 11 February 2019 (UTC)

Alternatively, every file caption added before a certain date could be could have a "{{Caption before March 2019}}" license or something template added to them stating that the caption is released under a different license. Nah, bad idea as these file captions can still be edited and changed and others could be added confusing re-users. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:55, 11 February 2019 (UTC)

"Delete all file captions from IP-users and users with few edits, as they may not be sufficiently aware of wikitext licensing. Change the license for the remaining file captions and future captions to Creative Commons BY-SA 3.0." This is a very odd proposal, this applies to literally all licensing on Wikimedia Commons and it would make no sense to delete their contributions if they are going to be released under the same license as the rest of the website is. This is like saying "let's delete all Wikipedia articles by new users and IP-users because they might not be aware of what license they might be using", not everyone is Marco Verch and I highly doubt that the users with few edits and IP-users thought that they would retain full copyright © for their additions. Also this wouldn't "change" the license but retain it because as far as anyone us concerned all text (including file captions) on Wikimedia Commons is Creative Commons BY-SA 3.0. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:15, 11 February 2019 (UTC)
Pictogram voting info.svg Info There is already a patch just needing to be reviewed, for adding the license information for structured data to the footer --GPSLeo (talk) 11:22, 11 February 2019 (UTC)

  • Pictogram voting comment.svg Comment We need to do something. I'm inclined to say we should disable the entire thing for now regardless until we figure out what that is. If there's anything I have a strong opinion about it's that we should abandon the notion that we are currently dealing with captions that are licensed under CC0, because they're not. We don't currently know what the licenses actually are, but presumably some proportion of them are not freely licensed and are creative enough to qualify under copyright protection.
I've tried to think of a few scenarios of "where do we go from here", and I struggle to answer that in any way that doesn't look like manually reviewing all current captions, sending out mass messaging to obtain active CONSENT, and deleting the remainder. But when I look at that level of mess, I struggle to justify anything other than deleting the whole lot under PCP and restarting the project from scratch.
Now that's a massively crappy solution that wastes several weeks worth of work and doesn't really make anybody happy. But...I legally appropriate notification of licensing terms is really Commons 101 stuff. In a situation where we're looking to do this structured data thing over the entire foreseeable future, maybe it's not all that bad to call it a good test run, do a thorough post mortem, and learn from our mistakes. GMGtalk 12:45, 11 February 2019 (UTC)
Of course it's a massively crappy solution. When I proposed on the Village Pump that the change was reversed, several voters said the equivalent of "oh, let's run this for a month and see how it goes before voting again". I hesitate to say "I told you so", because that sounds like I won something when actually we are all losing volunteer time, good faith and new users. -- (talk) 13:14, 11 February 2019 (UTC)
@Donald Trung: asking consent afterwards would, while ugly (but this whole operation is never going to get pretty..) be an acceptable solution. I'm not sure it'll be worth it though. For users who have entered many self-written (not copy-pasted..) captions it could be interesting. But to realize all this.. I'm afraid starting from scratch will be better. Cut our losses and get it right next time. - Alexis Jazz ping plz 16:21, 11 February 2019 (UTC)

Symbol oppose vote.svg Oppose all three options. Go discuss things at Commons_talk:Structured_data#CC0_licensing_mockups first.”

— Mike Peel (talk) 12:54, 11 February 2019 (UTC)

@Mike Peel: implementing a license notification after going live is entirely unacceptable (seriously.. why did you think that would be okay?) and doesn't do anything to resolve the license issues around captions that have already been entered. - Alexis Jazz ping plz 16:13, 11 February 2019 (UTC)

@Alexis Jazz: To be honest, I assumed that the notification was there and that I'd just accepted it at some point. My suggestion is to discuss things with the people working on this first, and then put together a proposal, not the other way around. Mike Peel (talk) 16:22, 11 February 2019 (UTC)
@Mike Peel: what those people think doesn't really matter. Believing really hard something isn't copyvio doesn't magically make it public domain. Captions without any CC0 license notification can't be live. Period. They need to fix the legal issues, then they go live. Not the other way around! - Alexis Jazz ping plz 16:29, 11 February 2019 (UTC)
Yeah, they're really two separate issues: how we fix the fact that we aren't providing notification currently (the discussion at COM:SDC), and what to do with the past contributions that were not provided notification (this discussion). Solving one, even if done quickly and smartly, doesn't really address the other at all. GMGtalk 16:32, 11 February 2019 (UTC)
Pinging @WMF Legal:, they should've been involved with this from the start so this ugly, ugly mess could've been avoided. Any caption I in my individual capacity have created on Wikimedia Commons falls (irrevocably) under the Creative Commons 0 (Zero) free license. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:36, 11 February 2019 (UTC)
Any caption I in my individual capacity have created on Wikimedia Commons falls (irrevocably) under the site's Creative Commons Attribution-ShareAlike License, version 3.0.   — Jeff G. please ping or talk to me 14:31, 12 February 2019 (UTC)
Well Jeff, I hope you're willing to file a takedown notice over it, because from the looks of things, it seems that much of the community is content to sweep the whole thing under the rug, retroactively license them however we please, and pretend like nothing ever happened. GMGtalk
Good approach. It is very straightforward to issue the WMF with a takedown and it sets a nicely referenceable precedent. -- (talk) 16:00, 12 February 2019 (UTC)
Could @WMF Legal: react at some point here? Personally I wouldn't find that a takedown would be a great way to achieve an outcome on this situation when there is still room for taking feedback into account. I'm afraid that if no appropriate action is taken this is how it will end up at some point, which would be – to my mind – really terrible for the image of the movement in general and WMF in particular. Solutions do exist: some went already pointed above, and we can discuss further to get new ones if needed. --Psychoslave (talk) 09:34, 6 March 2019 (UTC)

Whitelist for Flickr accounts belonging to federal agencies of the United States[edit]

As discussed here, Is there a way we can whitelist Flickr accounts belonging to U.S. federal agencies? It seems many of these accounts are defaulted to "All Rights Reserved" despite U.S. copyright law.--TriiipleThreat (talk) 22:50, 12 February 2019 (UTC)

That would be a question for Zhuyifei1999 who maintains the Flickr checking bot. --Majora (talk) 22:56, 12 February 2019 (UTC)
I could add that if there is consensus that the flickr accounts are entirely uploading image from their employees. --Zhuyifei1999 (talk) 23:52, 12 February 2019 (UTC)
I would Symbol support vote.svg Support such a whitelist.   — Jeff G. please ping or talk to me 23:00, 12 February 2019 (UTC)
  • Symbol support vote.svg Support, the uploaders probably don't select the correct licenses. I know of a few contributors who import from Flickr who contacted several American government agencies to change their licenses on-Flickr despite already being technically free. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 23:17, 12 February 2019 (UTC)
    @Donald Trung: What methods and verbiage are working for them? Perhaps we can work together.   — Jeff G. please ping or talk to me 23:57, 12 February 2019 (UTC)
  • Symbol support vote.svg Support a whitelist. Which accounts to put on it will require some discussion. @Zhuyifei1999: I can't remember if your bot already does this, but can it for example automatically insert {{PD-CAGov}} for images from Category:California Department of Fish and Wildlife? - Alexis Jazz ping plz 01:57, 13 February 2019 (UTC)
    I don't think it's a good idea to whitelist categories, only accounts. --Zhuyifei1999 (talk) 14:28, 13 February 2019 (UTC)
    @Zhuyifei1999: that's what I meant, insert {{PD-CAGov}} into images from - Alexis Jazz ping plz 14:49, 13 February 2019 (UTC)
    Then no, not currently. --Zhuyifei1999 (talk) 14:52, 13 February 2019 (UTC)
  • I obviously Symbol support vote.svg Support the whitelist as the nominator. @Alexis Jazz: there is a flickr group for official U.S. government photostreams, although its membership is private and I can’t be sure if every official federal photostream is a member, but it maybe a good starting point. Other than that, this may have to just be a running whitelist that keeps expanding as we come across federal accounts.--TriiipleThreat (talk) 13:33, 13 February 2019 (UTC)
    • Comment: The group also contains state, and local government photostreams so if we do get access to its membership, we still may have to weed through the members.--TriiipleThreat (talk) 13:39, 13 February 2019 (UTC)
      • @TriiipleThreat: what do you need access for? - Alexis Jazz ping plz 14:52, 13 February 2019 (UTC)
        • @Alexis Jazz: I think it would be a good idea to see if we can get access to the list of members of the flickr group. From there, we can add the federal members of the group to the whitelist. I just suggested it to make the initial list a bit easier to compile than hunting down accounts individually from scratch.--TriiipleThreat (talk) 15:19, 13 February 2019 (UTC)
FBI top secret info

I also found that this photo was actually taken in area 51. I knew they weren't from here. - Alexis Jazz ping plz 15:57, 13 February 2019 (UTC)

  • @Alexis Jazz: LOL, thanks. This minus the state and local accounts should be a good jumping off point. Edit: I added a few others from the previous discussion. Also I struck out some state and local accounts.--TriiipleThreat (talk) 16:33, 13 February 2019 (UTC)
  • @Majora, Zhuyifei1999: It’s been 7 days with no opposition. I think we can start to implement the whitelist.—TriiipleThreat (talk) 11:18, 19 February 2019 (UTC)
    I'll see if I have time tomorrow. No time today --Zhuyifei1999 (talk) 19:24, 19 February 2019 (UTC)
    @TriiipleThreat, Alexis Jazz: Can I get a clarification? Say a file is from 41723647@N08, the bot should unconditionally pass the file, and if it doesn't have a license tag, it should be tagged {{PD-CAGov}}. But what if it already has a license tag? What if it's tagged in a CC license? What about PD ones? What about the rare ones like {{FAL}}? --Zhuyifei1999 (talk) 02:32, 21 February 2019 (UTC)
    @Zhuyifei1999: Flickr doesn't support {{FAL}}, please see COM:FLICKR.   — Jeff G. please ping or talk to me 02:54, 21 February 2019 (UTC)
    @Jeff G.: What I mean is, what if the file is tagged here as {{FAL}}? Should the bot add the tag, replace the tag, or just perform a review pass ignoring that the tag might be wrong? Or should it mark the image for human review, or ignore the image altogether? Or should it bail out, or crash, or should it hack DPRK and release a nuclear bomb on the servers that hosts the bot so the bot can end its suffer from its slave-like work? Anyways, jokes aside, as a coder I must write code for every single scenario. Undefined behaviors are not acceptable. --Zhuyifei1999 (talk) 03:11, 21 February 2019 (UTC)
    @Zhuyifei1999: Sorry. IMHO, if the file should be tagged {{PD-CAGov}}, any other alleged license should be removed as copyfraud.   — Jeff G. please ping or talk to me 03:21, 21 February 2019 (UTC)
  • If the file is Creative Commons on Flickr, leave that license and review that license. Insert the appropriate PD template (if it's not there yet) in addition to it. The Creative Commons license can be useful in some jurisdictions and might be needed if we find out that for whatever reason PD doesn't apply.
  • If the file is "all rights reserved" and the uploader claims PD-US-expired, PD-old-70 or a variant, tag for human review.
  • If the file is "public domain mark" and the uploader claims PD-US-expired, PD-old-70 or a variant, do not insert any other license and proceed as usual.
  • If the file is "all rights reserved" or "public domain mark", insert the appropriate PD template (if it's not there yet) and remove any other license. Also remove any less precise (like the generic {{PD-USGov}}) templates.
And omit the self-destruct sequence. The cartoons try to teach you otherwise, but it is really not essential. - Alexis Jazz ping plz 04:34, 21 February 2019 (UTC)
Thanks. That's more complexities than I had expected. Will do this weekend. And self-destruct was just a joke to give some examples to what 'undefined behavior' means :) --Zhuyifei1999 (talk) 06:17, 21 February 2019 (UTC)
I don't think I have time to implement all these myself in these few weeks (sorry, too much IRL stuffs going on). Code is here; patches welcome. --Zhuyifei1999 (talk) 22:56, 24 February 2019 (UTC)

@Zhuyifei1999: It doesn’t have to be that complicated just unconditionally pass the file and if it doesn’t have a tag just add a PD tag, if it already has a liscense other than PD then add the PD tag anyway. Skip the ones that already have a PD tag. The bot can always be tweaked later.—TriiipleThreat (talk) 12:13, 25 February 2019 (UTC)

Again, patches welcome. I currently is unable to allocate time for this in these few weeks. --Zhuyifei1999 (talk) 18:37, 25 February 2019 (UTC)

Create a separate list for Flickr accounts that require an additional human review[edit]

Note: Zhuyifei1999 (who maintains FlickreviewR) has stated this is technically possible.

Flickr is a rich source of images for us. Because some Flickr users are known for license laundering or have other issues that can't be overcome, Commons created Commons:Questionable Flickr images. Once on this list, tools downright refuse to upload anything from these photographers. And images get deleted blindly because authors are on the list, regardless of the actual reason they were listed.

Unfortunately, this list grew to also include many accounts that accidentally uploaded something that doesn't adhere to our strict rules. Or made some mistakes while also having good, properly licensed own work. For an example, look at A perfectly usable and correctly licensed photo of the town hall in Bradford. Can't upload it with any tool, because Paul Stumpr has also taken photos of magazines and a cardboard Fred Flintstone. Another example: Russian Orthodox Church in Antwerpen. This photo was deleted because the account is on the "bad authors" list. The account is on the bad authors list because of this incident and while images from this account appear to require a human review, photos that were taken with a Canon EOS 5D Mark II or iPhone 4 are fine.

If this proposal passes, tools should ignore the new list or merely warn the uploader without blocking the upload. FlickreviewR should tag the images for human review while also still providing its own review. (in case the license changes before the human reviewer gets to it) For the accounts that are placed on this list, the admin that adds the account should also provide a description of what the license reviewer needs to look out for with that particular Flickr account. - Alexis Jazz ping plz 23:38, 9 February 2019 (UTC)

There are now two proposals. I added the second one later. The second proposal is a subset of the first. If the first proposal passes, the second would be redundant. If only the second proposal passes, any changes to how tools handle accounts on the new list will be determined by future discussions/proposals.

Create a list for Flickr accounts that require an additional human review: votes[edit]

Create a list, independent from Commons:Questionable Flickr images for Flickr accounts that do have properly licensed images that are within our scope but require a human review.

  • I would Symbol support vote.svg Support this as blacklisting should be a last resort and not a first resort (like Wikipedia's already do with "spam" websites which are useful and educational but get blacklisted because of one assumption of bad faith or incident. A lot of Flickr accounts have thousands of good images with hundreds of bad ones, most importers won't check the blacklist first and excluding good content over trivial reasons that could be handled by a handful of volunteers should not be an option. Post Script: You could copy this comment and any other comment I make to the actual proposals village pump when you're going to present your ideas as I'd rather not "vote" twice. Face-wink.svg --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 09:46, 10 February 2019 (UTC)
Donald Trung's vote copied from User:Alexis Jazz/Proposal incubator per his request. - Alexis Jazz ping plz 16:18, 15 February 2019 (UTC)
  • Symbol support vote.svg Support as proposer. - Alexis Jazz ping plz 16:18, 15 February 2019 (UTC)
  • Symbol support vote.svg Support Abzeronow (talk) 16:33, 15 February 2019 (UTC)
  • Symbol support vote.svg Support.   — Jeff G. please ping or talk to me 17:32, 15 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose will just create another backlog, when you don't even need tools to upload from Flickr. If an image is that valuable, it can be uploaded manually. See backlog Category:License review needed. Will also create more admin deletion work, when users abuse it.--BevinKacon (talk) 10:32, 16 February 2019 (UTC)
  • Symbol oppose vote.svg Oppose per BevinKacon. Natuur12 (talk) 17:35, 20 February 2019 (UTC)
  • Symbol support vote.svg Support - this seems to be a reasonable step. BD2412 T 14:57, 21 February 2019 (UTC)
  • Symbol neutral vote.svg Neutral - i would support this if and when you have some volunteers to work the backlog, or maintenance category. Slowking4 § Sander.v.Ginkel's revenge 14:33, 21 March 2019 (UTC)

Create a list for Flickr accounts that are not all bad (subset of the first proposal): votes[edit]

Create a list, independent from Commons:Questionable Flickr images for Flickr accounts that do have properly licensed images that are within our scope but can't be blindly reviewed by the bot because some of their Flickr uploads are problematic. This will lay some groundwork to handle those users differently in the future and better inform license reviewers. If, how and for which user group this list could be used by tools is to be determined by future discussions/proposals. Pinging @Clindberg, Donald Trung, Abzeronow, Jeff G., BevinKacon and Natuur12: the first proposal isn't void, but here's an alternative to implement the second list without deciding yet what to do with it. - Alexis Jazz ping plz 04:53, 21 February 2019 (UTC)

  • Symbol support vote.svg Support as proposer. - Alexis Jazz ping plz 04:53, 21 February 2019 (UTC)
  • Symbol support vote.svg Support, we need more checks and balances before completely blacklisting every upload by a user, we need more quality images and we need to preserve more historical knowledge, excluding good content because of a risk of bad content where there are willing volunteers we are shooting ourselves in the foot. We should see excluding content as an onion with blacklisting as the centre, the more layers the better for the project. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:57, 21 February 2019 (UTC)
  • Symbol support vote.svg Support.   — Jeff G. please ping or talk to me 12:09, 21 February 2019 (UTC)
  • Symbol support vote.svg Support This sounds reasonable. Better informing our license reviewers is also a good thing. Abzeronow (talk) 15:31, 21 February 2019 (UTC)
  • Symbol support vote.svg Support We need more options than just black and white. De728631 (talk) 18:42, 21 February 2019 (UTC)
  • Pictogram voting comment.svg Comment you don't need to create a proposal to create such a list, only how such a list is treated by the import tools.--BevinKacon (talk) 21:59, 21 February 2019 (UTC)
  • Symbol support vote.svg Support --B dash (talk) 02:32, 25 February 2019 (UTC)

Create a list for Flickr accounts that require an additional human review: discussion[edit]

Discuss details for this proposal here.

“will just create another backlog, when you don't even need tools to upload from Flickr. If an image is that valuable, it can be uploaded manually. See backlog Category:License review needed.”

A fair chunk of files from Flickr that need a human license review is exactly because they were uploaded manually. The full size doesn't get uploaded, the source not linked properly, and that's where the timewasting starts. With this, it's even feasible to show license reviewers on the file page what they need to look out for. They may often not even have to go to Flickr, making for relatively easy license reviews. Also, accounts that are not blacklist-worthy now could be on the new list. For example, Flickr accounts that are known to often share photos of sculptures in non-FoP countries.

“Will also create more admin deletion work, when users abuse it.”

I think that's a very dim view of Commons users. They're not out to get you. - Alexis Jazz ping plz 21:50, 17 February 2019 (UTC)

Yes I think that is a lack of COM:AGF. Users may occasionally err on copyright, but most do not deliberately upload copyvios. Most Flickr users are not Marco Verch, creating a list that requires human review would help get the good photos on Commons and keep the bad ones out. Abzeronow (talk) 17:41, 18 February 2019 (UTC)
Flickr abuse is a view held by many. Commons:Village_pump/Proposals/Archive/2018/08#Restrict_usage_of_Flickr2Commons.--BevinKacon (talk) 20:09, 18 February 2019 (UTC)
I see I didn't mention it here, but Marco Verch is an interesting example. I've seen several photos from Marco Verch pass the human license review, because the reviewers assumed that he was on the blacklist for some copyvio/DW/whatever. They didn't know Marco Verch is a bastard. Having separate lists would also help with that after the current blacklist would be sorted out. - Alexis Jazz ping plz 20:42, 18 February 2019 (UTC)
That discussion is more about how some users unknowingly upload bad files through a useful tool. Yes, that tool doesn't detect dupes in batch uploads(but it will in individual uploads). Which still is a small % of all Flickr uploads and the vast majority of Flickr users act in good faith. Some of the unsuitable for Commons files also have to do with lack of FOP, a matter that could be helped by such a list that Alexis Jazz proposes. It also would help keep the real bad apples(Marco Verch) on the blacklist and make that meaningful. Abzeronow (talk) 05:18, 19 February 2019 (UTC)
There may be options short of requiring review. Maybe just collect such accounts in categories (or be able to run a report), and at some point do some spot checks to see if there is a substantial amount of problem images coming through. Or maybe there would be a way to get a special warning message on the Flickr import interfaces to ask uploaders to double-check that the images appear to originate at that Flickr account, as there have been problems in the past, and see if that helps the rate any. Carl Lindberg (talk) 05:34, 19 February 2019 (UTC)
@Clindberg: I don't disagree, but I believe the license reviewing process for these can be streamlined to the point those reviews will take quite little effort. But either way, categorizing and/or adding a warning to the Flickr import interface will require a separate list for such accounts which is what this proposal attempts to realize. - Alexis Jazz ping plz 11:24, 19 February 2019 (UTC)
@Natuur12, Clindberg, BevinKacon: uploading images manually is a pain (especially if you want to upload more than two or three of them), users who perform manual uploads often make mistakes (copypaste the wrong source link, not upload the full size image, enter the wrong license) and when such an image does make it onto Commons, a license reviewer will sometimes erroneously give it a review because they don't know why anyone is on the blacklist or decline it "because blacklist". I think it's quite essential to differentiate between accounts from which we want absolutely no content and those from which we can accept some content. As well as provide an easy-to-see reason why an account is on a list for license reviewers. So what exactly is it that you oppose, and how could that be resolved? To limit overriding the blacklist to, say, autopatrolled users, is more complicated from a technical point of view. (but perhaps not impossible) Or do you oppose the very idea of having two lists instead of lumping everything together on a single list? - Alexis Jazz ping plz 20:18, 20 February 2019 (UTC)
No, I don't oppose a second list; it gives us some options other than a total blacklist which could be very useful. The proposal though was requiring a human review for this second list; I was saying there could be ways to use the second list without creating any additional human reviews, at least for now. But yes, if it turns out there are still problems with uploads from these users, especially after giving uploaders a special warning for them, we could either merge with the black list or simply remove the automated Flickr review and mark them as still needing human review, in the existing queue. We could change how we deal with this list over time. Carl Lindberg (talk) 23:30, 20 February 2019 (UTC)

Add rights from the autopatrollers user group to the rollbackers user group[edit]

We currently have 643 rollbackers. When you remove the autopatrollers, patrollers and license reviewers from that list (who are all autopatrolled), 25 oddballs (3.9%) remain. Among those are Rodhullandemu who should have this right removed this moment, JeffGBot which is a bot (so also autopatrolled) and NNW who now uses a new account, NordNordWest. (which is autopatrolled) The 22 users (3.4%) that remain haven't contributed in years (or a few dozen edits at most) so were never really considered for autopatrol.

Basically, if you're a rollbacker, you're autopatrolled. So why not merge rights? For clarity: this proposal is supposed to make the "autopatrolled" user group redundant for rollbackers.

Add rights from the autopatrollers user group to the rollbackers user group: votes[edit]

  • Symbol support vote.svg Support (as proposer) - Alexis Jazz ping plz 00:59, 22 February 2019 (UTC)
  • Symbol support vote.svg Support, logical.   — Jeff G. please ping or talk to me 01:16, 22 February 2019 (UTC)
  • Symbol support vote.svg Support. Considering Special:ListGroupRights, I support adding "autopatrol" and "upload_by_url" rights to the "rollbackers" user group. 4nn1l2 (talk) 07:00, 22 February 2019 (UTC)
  • Symbol support vote.svg Support, as rollbacker is an "advanced right" anyhow. Of course I believe that what gets rollbacked should also be checked, but the user right isn't just given away willy-nilly. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 07:41, 22 February 2019 (UTC)
  • Symbol support vote.svg Support per nom. --B dash (talk) 09:40, 22 February 2019 (UTC)
  • Symbol support vote.svg Support BTW. I think that file movers also should be granted autopatrol. BTW2. Looking at Special:ListGroupRights I can see that stewards (!!!), oversighters and bureaucrats do not have this right! So why admins do? --jdx Re: 08:24, 18 March 2019 (UTC)
    Oversighters and 'crats are inherently part of the admin group so no need to be redundant there. You really won't see a functionary or a 'crat that isn't already a sysop. That steward group is the local steward group and there are no members. The local group is completely separate from the actual m:Stewards. --Majora (talk) 04:15, 21 March 2019 (UTC)