Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Contents

Notability discussion on Wikidata[edit]

People following the structured data project may be interested in this discussion that has kicked off on Wikidata:

d:Wikidata:Project_chat#Creating_new_items_for_humans_based_on_Commons_categories

A bot job started by User:Mike Peel has been stopped, that was creating Wikidata items for people with Commons categories that cannot be matched to Wikidata, on the grounds that such people may not necessarily be notable.

SDC of course would require such items to exist, for it to be possible to make statements about them. Items for every person or thing that currently has a Commons category would seem a bare minimum -- some visions for SDC envisage going much further, for example creating an individual Wikidata item for every single separate museum object that we currently have an image of.

Whatever the outcome, this is something we desperately need more clarity on, looking forwards; not least to plan around, in the event that such items on Wikidata would not exist. Do any of the SDC team have any thoughts, eg @SandraF (WMF): ? Jheald (talk) 12:49, 7 January 2019 (UTC)

Jheald thanks for the link. --Jarekt (talk) 15:04, 7 January 2019 (UTC)
The decisionmaking around this topic is fully up to the community... As a staff member, I want to make a point of not wanting to impose an opinion on this at all. With my volunteer hat on, I have no strong opinions either. If we create Wikidata items for everything, we must be able to properly maintain that huge mass of items too... I think less notable heritage objects can be modeled purely based on more generic statements (represents vase, with features blue paint / flowers/fishes... / designed by x / with inventory number nnn / in collection y) on Commons, and we can also decide to model less notable people in a similar, more generic way there. But I will happily follow the broader community's wishes if there is consensus about creating Wikidata items for everything. SandraF (WMF) (talk) 17:37, 7 January 2019 (UTC)
@SandraF (WMF): I think less notable heritage objects can be modeled purely based on more generic statements [i.e. without their own items], and we can also decide to model less notable people in a similar, more generic way
If you do believe this, I would like to see a fully worked-up example, to establish (i) how information about the underlying object, and its nature, creator, copyright status, licensing, history etc, would be kept distinct from information about its depiction/photograph; (ii) how this is possible when it is not possible to have qualifiers on qualifiers -- something the current Commons:Structured_data/Properties_table shows up as a major unresolved difficulty; (iii) how this would play alongside images where description in terms of wikidata items would be possible -- how great would the difficulties be that we would get into, if we would be trying to operate two quite different data models at the same time?
Rather than just you saying that you think this can be done, if having to go down this road is even slightly conceivable as an outcome, I would like to see some hard modelling to show how it definitely can be done; and what the consequences would be. Because to date I'm not sure that the data designs so far presented would cut it. Jheald (talk) 18:23, 7 January 2019 (UTC)
Yup. Whether one or the other solution is satisfactory is up to the community to reach consensus about! Deployment is around the corner, so the community can try this quite soon. Seeing the technology in front of one's eyes will certainly clarify things and cause more people to have strong opinions about this. SandraF (WMF) (talk) 09:03, 8 January 2019 (UTC)

Multilingual file captions coming this week[edit]

Hi all, following up on last month's announcement...

Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:

  1. Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
  2. Test out using captions on Beta Commons.
  3. Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.

Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.

Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. I'll add a reminder to this post once I confirm exactly what day captions will be turned on for Commons. Keegan (WMF) (talk) 01:06, 8 January 2019 (UTC)

  • Apart from the (cumbersome and totally useless) language selection drop-down, which seems to have been already in use elsewhere, I cannot see anything new. So, descriptions can and should be added to Commons files — how’s that any different than previous practice? -- Tuválkin 14:54, 8 January 2019 (UTC)
  • Could you please explain more on how you find using the translation feature cumbersome and useless? Do you find it easier or more difficult to add a language to a description template? You are certainly welcome to not use captions if you do not find the feature useful for your work, but if there's a way that it can be improved we'd like to hear about it. Additionally, if you could provide a link to this tool that seems to already exist elsewhere here, I'd appreciate it because I haven't seen it and I'd like to take a look. Keegan (WMF) (talk) 18:16, 8 January 2019 (UTC)
  • what I said is cumbersome and totally useless is the language selection drop-down, which seems to be the same exact element that shows up as a generic language-selection tool when one uses an WMF project while not logged in. I think it is cumbersome because it is made up mostly of empty space and the way languages are sorted (geographically, and ignoring the browser’s options on prefered languages?) makes it hard to find a language, not to mention the unintuitive way with scrolls and gains selection focus. I guess that you coopted this pre-existing element (which is of course good practice), but it is an essential part of the whole caprions feature. For me, the ideal language selector is a single, easily scrollable list of languages, properly sorted (the collation of which would be interesting to discuss, in terms of internationalized user expectations), whence to pick one out (one or several — Ctrl-click does work on some devices). That much for cumbersome. It is also useless because when a user is logged in there’s no need to present a complete list of languages. Even the most formidable polyglot will have to pick from a dozen or two; only in the unlikely situation one would be contributing in a language one’s not versed on such a general selection too wouled be needed — and for that a "more languages" button seem better than what we have now.
  • I certainly do find it simplest to click to edit the file page’s wikicode and add {{ab|Something here.}} (or {{ab|1= Something here.}}, if I’m feeling chatty) next to where it says |Description = }} — way simpler than going through UI hoops, but I understand that’s not what you’re after, especially since what I find simplest is already tried and tested and working for many years. But even if wikitext needs to be not offered, there’s Visual Editor apparently working in many projects; adding captions to be injected in a page seems to be the most basic of its functions.
What I asked is what this new feature amounts to. We’ll have a pencil icon that brings up an already existing language-selection tool and thence we procede to a rich text entering box/screen whose working are new either? Is it the pencil icon that is new?, to be shown in file pages and interspesed in the upload wizzard? -- Tuválkin 22:36, 8 January 2019 (UTC)
From a technical standpoint, captions are like labels on Wikidata. They will be searchable through the API, making it easy to find/filter/pull captions from files as metadata. There are a lot of possibilities of what this can be used for, from filling in infoboxes, building lists for translation of important files needing a caption localized for a project or campaign, searching for and finding captions, etc. So in comparison to description templates, while they potentially contain similar data to a caption, their function and reuse purposes are very different. Keegan (WMF) (talk) 19:57, 9 January 2019 (UTC)
  • @Keegan (WMF): Understood, thanks. Maybe this field can/could be populated with the contents of the |Description = field of {{Information}}, {{Artwork}}, and other such templates. -- Tuválkin 01:43, 10 January 2019 (UTC)
  • If I understood it correctly, Tuvalkin meant an automatic action. The ideas behind structured data, semantic web and LOD are great, changes are good, but when I think of few thousands of my files on commons I would really love some bot. Especially, that descriptions are well structures in templates and mostly in size of captions. Nova (talk) 20:57, 10 January 2019 (UTC)
  • I think a real bot would create to much trash, but a Tool like VisualFileChange would be great. --GPSLeo (talk) 21:03, 10 January 2019 (UTC)

Captions are live[edit]

Captions can now be added to files on Commons. There's a bug with abusefilter sending errors to new accounts adding captions, the bug is being investigated and fixed right now. IRC office hours will be in a little over one hour from now, I look forward to seeing you there if you can attend. Keegan (WMF) (talk) 16:50, 10 January 2019 (UTC)

  • Is there a way to disable the box, or make it much less invasive? It's very annoying that it pushes the actual captions some half page down. Nemo 18:13, 10 January 2019 (UTC)
    • @Nemo bis: Make this edit to your user css and it'll disable the captions. If @Keegan (WMF): or someone could add an ID to the css surround for it then we could attach some extra css tags to it to show/hide it, which would be better. Thanks. Mike Peel (talk) 19:36, 10 January 2019 (UTC)
      • @Mike Peel: I'll make a Phabricator ticket later today to look into that. Keegan (WMF) (talk) 20:00, 10 January 2019 (UTC)
      • Thanks for the css tip, looking forward the show/hide option. Nova (talk) 19:56, 10 January 2019 (UTC)
        • Update: try this to also disable the "structured data" header. Thanks. Mike Peel (talk) 20:28, 10 January 2019 (UTC)
          • Works better now, thanks. Nova (talk) 20:41, 10 January 2019 (UTC)
    • It's easy indeed to hide the entire thing, but I'd just like it to still be there somewhere and not take one third of my screen or so. I suspect someone assumed that people don't care about existing descriptions being pushed out of the screen, or that nobody speaks more than 2 languages. Nemo 16:07, 11 January 2019 (UTC)
  • Hi, @Keegan (WMF): I see that a file ID, called "entity" is added as the first time a caption is created. Pictogram-voting-question.svg Question is the IDs created only when we add a "structured data" for the first time, or will IDs will be created automatically for each existing files? Christian Ferrer (talk) 18:44, 10 January 2019 (UTC)
    @Christian Ferrer: Hey, good spot. This is something we fixed in development yesterday, and will not be displayed from next week (the other issue is that it says "label" rather than "caption", which will also be fixed in the same change). Jdforrester (WMF) (talk) 18:49, 10 January 2019 (UTC)
    There is also some other things that I noticed. There is not anymore rollback, but if I remember well we already talked about that heu.. no, the rollback works well.... The second thing is : if you create a caption, then a ID is created, ok. If you revert then the ID is also removed. It's confirmed by the exact same number of bytes added and removed to the file.
    Now if you delete all the captions without having reverted, then the captions are indeed removed, but the ID stay. It's also confirmed by the number of bytes. This is not really a question, just a thing that I noticed. Regards, Christian Ferrer (talk) 19:05, 10 January 2019 (UTC)
    @Christian Ferrer: Yeah, it's technically the marker for the entity ID. It's not relevant to users (they can't use it and they can't change it; it's just the reference for the database), so we won't be showing it (if you really need to know it, it appears on the action=info page). The "byte size" change is also not very helpful or accurate as that's a measure for the database which depends on the JSON serialisation of the entity model, but removing that from history pages would probably be disruptive for power users so we don't plan to do that right now. Jdforrester (WMF) (talk) 19:18, 10 January 2019 (UTC)
  • Nice feature. It would probably be useful to write local file caption guidelines within Commons, in order to tell people which style is expected in the caption. mw:Help:File captions is a rather technical manual (no markup, how to undo, and so on), but things like capitalization, punctuation, preferable caption length and so on should probably also be advised to users… —MisterSynergy (talk) 19:44, 10 January 2019 (UTC)
  • @Jdforrester (WMF): Note that the "Captions" features are available in the file redirect pages. What will be the impact if captions are added there? Christian Ferrer (talk) 21:04, 10 January 2019 (UTC)
    @Christian Ferrer: Another good spot. Captions on redirect pages aren't useful, and we will disable them, but it won't break anything. We've filed a Phabricator task to do this. Jdforrester (WMF) (talk) 21:07, 10 January 2019 (UTC)
  • Just whoa. You guys let this thing go live like this? With a layout that will immediately antagonize the exact kind of contributers who would be the most enthusiastic and productive about captions? A layout that hoggs whitespace (does it even look tighter in monobook, respecting its default margins and paddings?), a layout that puts this thing above all else on the page (above "Summary", srslsy?), under an H1 heading (whiskey tango foxtrot, aren’t you guys all about structure?!), with some wierd horizontally divided box which will mistify both oldschool HTML 1.0 veterans and swipe-swipe whipperspnappers (click on the pencil to edit the caption text under it across a line?; why not clicking the caption itself?, or put a proper button next to it!)…? After you’ve been working on this captions things since May last year, at least? Good grief, you’re supposed to be the code gods that are going to dig a ditch between yourselves and the computer illiterate masses, burying all the power users in it. Turns out part of that dire prediction isn’t true after all, but sadly that’s the part about code gods — for this gizmo seems utterly ungodly. And therefore I’m gonna sprinkle some CSS holy water on my ”skin” and forget I ever saw this thing live in production looking like this. -- Tuválkin 02:09, 11 January 2019 (UTC)
  • And nobody thought of turning it off for file redirects oscar mike golf. -- Tuválkin 02:12, 11 January 2019 (UTC)
  • This is really bad. Please turn it live only when at least Template:Artwork is correctly handled, either by using the wikidata element or the description field with language template. The feeling now is that all the hard work that was done to describe files is going to be lost. Léna (talk) 13:08, 11 January 2019 (UTC)
  • Agree with some of the comments above. I thought the team had accepted and undertaken that structured data needed to be on a different tab to the regular file information, after this was flagged by multiple respondents in the Statements consultation (September-October 2018)
As a result, in the "What's new" section of the "Statements 2" consultation (November 2018), User:Keegan (WMF) wrote:
The tabs for Wikitext content and metadata (respectively called 'File information' and 'Structured data' for the purposes of this discussion) are now true tabs instead of anchor links, which should reduce/eliminate the occurrence of super long pages.
Such tabbing is necessary, and should be implemented ASAP. The Structured Data is (or, we hope, will be) very important for machines. But it is important it should not get in the way of the templated information for humans. Jheald (talk) 14:27, 11 January 2019 (UTC)
Jheald's last argument hits home. Nova (talk) 16:51, 11 January 2019 (UTC)
In the statements consultation I was referring to the decision to put statements behind tabs. Captions were never planned to be hidden from users, but most of the rest of SDC will be behind a tab. I think the planned new box that gathers "use this file" and attribution generation is probably going in the whitespace that already exists and is unused to the right of files (as seen in the statements mockups), but as far as I know for now that's the only other visible thing. Keegan (WMF) (talk) 18:06, 11 January 2019 (UTC)
I do see the problem in how the mockups are presented, though, by not showing the "File information" tab first. A side-by-side comparison that would have shown captions on the "main" file page, instead of them simply being absent from the statements mockup. I'll make sure to not repeat that mistake in the next feature design consultations. Keegan (WMF) (talk) 18:25, 11 January 2019 (UTC)

Accounts on Beta Commons[edit]

Trying to create an account on Beta Commons: is it possible that the error message "The passwords you entered do not match" arises when the actual problem is something else? I really doubt that four times in a row I couldn't match my password correctly, but four times in a row I got this same error. - Jmabel ! talk 09:59, 9 January 2019 (UTC)

@Jmabel: It's most likely that you're trying to use your SUL account. The Beta Cluster does not operate at the high levels of security that we have for production, hence the message above the login form:
This site (Beta Commons) allows WMF staff and community volunteers to test MediaWiki in a production like environment.
Do NOT use your normal password, or any password you use anywhere else online.
Did you already have a Beta Cluster account? If not, did you create one?
Jdforrester (WMF) (talk) 15:38, 9 January 2019 (UTC)
Again, as I wrote above, I was trying to create an account. It asks me to enter an account name, an email address, and enter a password twice. I got this error message after doing so, four times.
Is it a problem that I used the same name as my account here? It shouldn't know whether the accounts have the same name. - Jmabel ! talk 18:24, 9 January 2019 (UTC)
@Jmabel: Oh, sorry. No, it's not going to know about you having an account on this (other) system. I just created a test account there and it worked fine. Jdforrester (WMF) (talk) 18:43, 9 January 2019 (UTC)
Jmabel, it used to be the case that MediaWiki on some non-production servers (like wikitechwiki) was not able to deal with passwords containing certain Unicode characters. Have you tried using an ASCII password, as silly as this might sound? Nemo 19:15, 10 January 2019 (UTC)
Well, at this point the feature is live, so there's no point to my looking at the Beta. - Jmabel ! talk 00:43, 11 January 2019 (UTC)

Accessing the captions via lua and pywikibot[edit]

@Keegan (WMF): (and others): are there ways to access the captions using Lua and pywikibot, or are they human-accessible only at the moment? Thanks. Mike Peel (talk) 16:45, 11 January 2019 (UTC)

Humans-only for now, or read-only via API. This will change, I do not know when at the moment. Keegan (WMF) (talk) 18:43, 11 January 2019 (UTC)
OK, thank you. If you can let me know when it is available then I can see if it can be integrated into the wikidata infobox to supplement the captions from Wikidata (and/or sync those over to here). Thanks. Mike Peel (talk) 19:54, 11 January 2019 (UTC)

Copyright status of structured "items"[edit]

Are "Captions" and other SDC "items" released under the CC BY-SA 3.0 like the rest of Wikimedia Commons or under the CC0 license like Wikidata? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 23:42, 11 January 2019 (UTC)

My guess is that free text captions are CC BY-SA 3.0, like the rest of the page. However most of the properties will be de-facto ineligible for copyright so for all practical reasons CC0. --Jarekt (talk) 03:43, 12 January 2019 (UTC)
There's a phabricator ticket that's been asking this question since 2017, but with no meaningful input yet.

With the captions that are live right now, it's something that needs to be clarified urgently. @Keegan (WMF): ?
Since the whole page is licensed CC BY-SA 3.0, and there is nothing to indicate anything different for the captions, I think that means that any caption being added by a user at the moment has to be considered to be CC BY-SA 3.0. I think the contributor would be entitled to assume that that is the license under which they have made the caption string available, given that there is nothing else anywhere indicating anything other than this. This needs to be addressed quickly if the ultimate intention is to release the data CC0, because otherwise there will a considerable set of CC BY-SA 3.0 captions building up, that would have to be cleared to the more permissive release.
A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages. Even if reporting that the creator of a painting was Leonardo da Vinci may be an uncopyrightable fact, extracting such information at scale may fall subject to database rights that might be only available BY-SA. Other information, eg saying that a painting was "probably painted c.1530" or was considered to be by "a follower of Raphael", may reflect real intellectual choices that could attract copyright in their own right, particularly if a substantial number were taken. This is a question that needs clarification, before substantial data transfer starts from Commons templates. Jheald (talk) 11:38, 12 January 2019 (UTC)
Jheald, I disagree with legal theory that metadata, listing basic facts about someone or something can produce its own copyrights. You are not making artistic choices here and merely reporting information you found in the references. Otherwise you are doing something wrong. That is why it is OK to copy such information from Wikipedias or Commons to Wikidata. However, I agree that this should be clarified sooner than later. My vote would be to store most or all of Structured data under CC0 license so it is compatible with Wikidata. @Keegan (WMF):, I think this is important, especially since Commons is a project which is obsessed with getting copyrights right. We might need to discuss it as a project, but I also think WMF lawyers should look into it as well, especially if we start reusing the data and combining it with wikidata. --Jarekt (talk) 04:01, 13 January 2019 (UTC)
It would probably be wise to simply add the text "by publishing this you agree that you release this caption with the CC0 license" but it might confuse people into thinking that this applies to all texts or something. Maybe one of the developers should open a proposal at "Commons:Village pump/Proposals" and ask for community feedback in how to clarify this without being "too intrusive". But a simple indication that all Structured Data on Wikimedia Commons "Items" are CC0 would suffice in the beginning of the process. Let's not forget that this is (legally) important for the re-users outside of Wikimedia websites as they're the people this whole system is built for. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:52, 13 January 2019 (UTC)
I'm seeing where we are on all this. Keegan (WMF) (talk) 19:56, 14 January 2019 (UTC)

┌─────────────────────────────────┘
@Donald Trung, Jheald, Jarekt: We are planning to ask users to release items on Structured Data on Commons under CC0, and we are working with the legal team to add the right license-related language. Full text pages on Wikimedia Commons may still be under CC BY-SA.

“A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages.”

The text on Commons is CC BY-SA currently, to the extent it's protected by copyright. However, most captions are not likely to be copyrightable, since they are short and factual. Copyright protects creative works of expression, and not the underlying ideas. It's possible that there will be some descriptions that are so idiosyncratic to get Copyright protection, and Commons has a couple of options if that comes up: 1) argue that they are not copyrightable, 2) just remove those captions and write a better, CC0 caption. Either choice would be guided by how the Commons community would like to settle it on Commons:File captions. Keegan (WMF) (talk) 22:08, 15 January 2019 (UTC)

@Keegan (WMF): "Short and factual" doesn't get you a copyright waiver. To be without copyright there has to be essentially no choice in the text that was written. That's simply not true for a caption, and certainly not true for 40,000,000 of them. Jheald (talk) 23:06, 15 January 2019 (UTC)
Hullo @Jheald: the WMF legal team says that Copyright law protects creative expression, and not ideas or concepts. Short phrases that can only be written in a limited number of ways are not protected. The caption field has a few limitations: it can only have 255 characters, it does not allow Wikitext, and it should factually describe the image. The vast majority of captions will not be a sufficiently creative work of authorship to be copyrightable. A few references from the team: Stanford Law School published a guide on how this question has been handled by U.S. Courts, and U.S. Copyright Office Circular 33 explains what level of creativity is required for protection. In cases where a description is so idiosyncratic as to require compliance with CC BY-SA, it should be removed from a CC0 caption field. Copyright law can be unsatisfyingly unclear, so if you need more help clarifying what kind of creativity should be considered, please contact the legal team and they may be able to help by writing guidance in Wikilegal.
— Preceding unsigned comment added by Abittaker (WMF) (talk • contribs) 01:12, 16 January 2019 (UTC)
@Abittaker (WMF): This is Commons. You're not just dealing with U.S. copyright here, you're dealing with copyright for the whole world. And I dispute the claim that just because the caption is limited to 255 characters and needs to factually describe the image, that that means there are only a limited small number of ways it could be written. On the contrary, there may be any number of aspects of the image that the caption-writer may choose to foreground, and any number of ways to present them. The choice of one rather than any of the others is the writer's expression, and that is what copyright protects. It's no good trying to wish this away: there is an issue here which needs to be faced. Jheald (talk) 01:22, 16 January 2019 (UTC)
It may be interesting to note that the Indian Supreme Court recently affirmed that legal headnotes were protected by copyright. [1]. That's despite headnotes, on the face of it, being more formulaic and more derivative than image captions.
Similarly, this 2006 paper [2] suggests that (p.187) "other than the headnotes, private publishers probably do not have copyright in the court decisions they are publishing" (emphasis added).
In Canada the copyright status of headnotes was affirmed in the 2004 case CCH Canadian Ltd v Law Society of Upper Canada. Jheald (talk) 02:08, 16 January 2019 (UTC)

┌─────────────────────────────────┘
We do not have to figure out legality of copying captions at this point. The CC0 aspect of SDC should be advertised and copyright aspects taken into account before any bot migration of captions or other free text descriptions. However other SDC data should be OK as those are non-copyrightable facts. Also I do not agree that we need to consider laws of all jurisdictions when discussing copyrights of SDC. There are many jurisdictions and some have some odd laws (See here for example). However when disusing laws related to Commons text, than the only law we need to consider is US law as that is where the servers reside. --Jarekt (talk) 03:58, 16 January 2019 (UTC)

  • By the way, the text on the bottom of each Commons page "This text is available under the Creative Commons Attribution-ShareAlike Licence; additional terms may apply" should become something similar to d:Wikidata:Copyright: "All structured data from the [SDC] namespace is available under the Creative Commons CC0 License; text in the other namespaces is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. " --Jarekt (talk) 04:14, 16 January 2019 (UTC)

I would strongly recommend keeping CC-BY-SA as the licence for structured data on Commons. We are talking about the potential of harvesting 50 million sentences' worth of caption from existing files, which are already licensed in CC-BY-SA (or something compatible). Any change in copyright terms that prevent existing image descriptions from being converted into structured data will defeat the point of adopting structured data. Deryck Chan (talk) 17:31, 20 January 2019 (UTC)

Well, that’s debatable :) There was a similar discussion with Lexemes on Wikidata. In the end, they went for CC-Zero, in the full knowledge that this would prevent mass-copy from the Wiktionaries. Jean-Fred (talk) 18:13, 21 January 2019 (UTC)
There are strong arguments for both, for "keeping CC-BY-SA" there is the ability of backwards compatibility that allows for a mass-import of community generated organisation hat will require the least amount of effort to help set up the structured data programme, however those in favour of CC-0 can point out that this would make the database fully exportable by third parties who wish to utilise Wikimedia Commons. In the end the issue comes down to what we want structured data on Wikimedia Commons to be, should it only be used for internal organisation? or are there external incentives that will benefit the generation of free knowledge? Maybe this would have to be discussed by the community, but I'm sure that we can trust the Wikimedia Foundation (WMF) and Wikimedia Deutschland (WMDE) to do what's in everyone's best interests. Face-smile.svg --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 15:49, 22 January 2019 (UTC)

“However, most captions are not likely to be copyrightable, since they are short and factual.”

“the WMF legal team says that Copyright law protects creative expression, and not ideas or concepts.”

I bet neither one of you has been talking to WMF legal about this. Let me introduce myself. Alexis Jazz, one of Commons' not-actually-a-lawyer legal specialists*. You would generally be right if you were talking about one single caption, or one tweet. Or any string of 5 words from a Harry Potter book. So by that logic, I can copy 5 words from a Harry Potter book, print them in my own book, and sell this brilliant creation and make boatloads of money.

But what if I wanted to make more money?

Why, I just copy another 5 words! And I put them after those first not-quite-eligible-for-copyright 5 words. Because 10 words are much more interesting, and really, they aren't 10 words. No no no no! They are 5 words, and only 5, but there happen to be two strings of 5 words that follow each other. But still only 5 words really! But I guess a book with only 10 words won't make me a millionaire either. So let's copy another 5. And another. And another. Until I've copied the whole book. Or all of someone's Tweets. No copyright, right? After all, it's all just a string of letters and the alphabet isn't protected by copyright so we're in the clear.

That's what you are doing.

*Not legally a legal specialist. Offer void in Nebraska.

There are severe issues with this and I have raised this on VPP. - Alexis Jazz ping plz 00:04, 11 February 2019 (UTC)

To add a category just after a caption[edit]

This morning each time I tried to add a category after to have added (and after saving) a caption I got this message : File:Screenshot Editing File Ophiozonella nivea (YPM IZ 007648 EC) jpg - Wikimedia Commons.png. Christian Ferrer (talk) 07:49, 12 January 2019 (UTC)

Hi, Yes, I already reported that: phab:T213462. Regards, Yann (talk) 08:49, 12 January 2019 (UTC)

Look and appearance of captions[edit]

Hi, it will be great the the file pages can keep a visual coherence.

1/ The title "Structured data" in a file page should be at the same sizes, and not bigger, than the other headers such as "Summary", Licensing", ect...
2/The size of the caption box on my screen (1920*1200) is a very little smaller than the {{Information}} and than the license template. It would be great that all boxes and templates be at the same size.

Regards, Christian Ferrer (talk) 12:59, 12 January 2019 (UTC)

@Christian Ferrer: Hey there,
The reason the MediaInfo section is under an H1 is because it's its own page component, at the same "level" as the wikitext block, which also has an H1 (the page's title). Right now the design is in flux, and I agree that it's a little confusing. In the future the design is going to change; the most recent design feedback session about this would mean that the H1 wouldn't appear, but instead the parts of the page would be split with tabs. That discussion is now closed, but I'd be interested to hear from you if that proposal would work for you.
You are right, the text in the Information template is 5% smaller than in the rest of the page – it's set to 95% (=13.3pt) of the general page content size (=14pt) in the template by using the class toccolours. I don't know why this was done, but it's been this was for a very long time, so I imagine a community discussion would be needed before changing the template.
Jdforrester (WMF) (talk) 21:26, 12 January 2019 (UTC)
I've no special strong opinion about potential tabs, but I have not really though to that for now. I just think that if headers there are, in a file page (or in a specific tab), then all main headers should be at the same level.
The size of the text was not my concern, I talked about the size (width) of the caption box compared to the width of {{Information}}, the width of caption box seems a little smaller
After a night's sleep I woke up with the certainty that you should limit the display at one langage at one time. Me I have 3 lines, and this is really boring (and some users have more...) although one line is fully accepteble. Furthermore I don't plan to write any caption in Spain langage + I don' want to hide this caption box, and now the result is that it comes to mind to remove es-2 from my babel just to avoid those 3 boring lines....When I looked to a file page when I was not connected, I found the 1 line box much much better...
Now that we have Commons:File captions, and in order to give infos to the visitors and editors, maybe that a link to that page should be given in the caption box, if not in the default display so then in the editing mode.

Christian Ferrer (talk) 05:26, 13 January 2019 (UTC)

How to search SD?[edit]

When and how will WP search support the search in SD? My naive approach using incaption:value was not successful. I noticed that the search takes the captions into account and finds them [3], but there should be a way to search specifically for captions.

BTW, searching for the help text "Add a one-line explanation of what this file represents" should not find any matches: [4] --Herzi Pinki (talk) 17:17, 12 January 2019 (UTC)

@Herzi Pinki: That's because you're using the wrong search engine; it's in this one. Jdforrester (WMF) (talk) 21:14, 12 January 2019 (UTC)

filed a bug report for the BTW. --Herzi Pinki (talk) 21:53, 12 January 2019 (UTC)

You did not get me. I did not want to search the code. MediaWiki Search should support searching captions like titles. best --Herzi Pinki (talk) 22:02, 12 January 2019 (UTC)

Search will be supported, it's not turned on yet. Keegan (WMF) (talk) 19:38, 14 January 2019 (UTC)

Wikitext[edit]

Just curious, but why can't descriptive information from Wikitext "Descriptions" and "Categories" be harvested to create structured data? Licenses could be harvested right? Then why can't a bot harvest vital user-generated information from both native descriptions and organization? It just seems like a major handicap that existing Wikitext on over 50.000.000 (fifty million) media files can't be utilised. Confused.png --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:14, 13 January 2019 (UTC)

Sure, technically, they could be harvested. If we, the Wikimedia Commons community, want to do that, then we could have bots do that. We could hardly do that though before captions were available. (I guess there could have been a soft launch with "captions available but invisible and only editable by bots", but personnally I don’t think this was necessary − Wikidata also launched pretty much empty, and the community went on to make bots that seeded it).
Also, I don’t think that this is the job of WMF to do it − fairly sure that had the WMF done something like that, there would have been outrage that they’re touching the content. Jean-Fred (talk) 12:25, 13 January 2019 (UTC)

Information template problem[edit]

Hi, some of my yesterday uploads to which I've added caption in Polish (via the UploadWizard form) seem to have broken Information template appearance, as you can see in this file page. Deleting the captions didn't help. Nova (talk) 10:57, 13 January 2019 (UTC)

Sorry, my mistake, noticed and fixed by Multichill [5], thanks. Nova (talk) 14:14, 13 January 2019 (UTC)

Hiding captions[edit]

Let me get straight into the point:

How do I hide the captions from my view? Any CSS code I can put on my common.css to remove that element?

Thank you. — regards, Revi 10:59, 13 January 2019 (UTC)

Found out the answer myself by looking at the above topic. For those who may be looking for this:
/** Initially posted by Mike Peel */
/** Just hiding captions */
.filepage-mediainfo-entitytermsview { display:none;}
/** Hiding "Structured Data" header */
.mw-slot-header {display:none;}
— regards, Revi 11:04, 13 January 2019 (UTC)
I thought original idea of Structured Data on Commons SDoC hereinafter (or a prototype of it I saw) was putting the SDoC section below the GlobalUsage or something like that. Is it me who has a wrong memory or has something changed thus creating this what the hell design? — regards, Revi 11:12, 13 January 2019 (UTC)
@-revi: There are two gadgets available in your preferences now - one that hides it completely as per the css, and one that collapses it by default but lets you expand it again if you want. Thanks. Mike Peel (talk) 11:48, 13 January 2019 (UTC)
Since I prefer codes on my local page rather than gadgets, thanks for letting me know but I will keep status quo. BTW, if possible, I want to force them to be H2, and moved to elsewhere (let's say, above Metadata section as prototype I recall was). Is it possible with CSS/JS? (Not asking you to do that but just wondering).
I have no interest in using it as currently is (it just sucks) but with the modification to put them away from the top of the page, it would be usable. — regards, Revi 18:20, 13 January 2019 (UTC)
@-revi: You can shrink it using e.g. h1.mw-slot-header { font-size:1.5em;}. as far as I can tell, moving it to a different part of the page is something the WMF would have to do. Thanks. Mike Peel (talk) 19:29, 13 January 2019 (UTC)
I think you can move things around using JavaScript (but because personal JS is loaded at the end, it will "jump around") − see for example MediaWiki:Gadget-CategoryAboveAll.js which moves the category box at the top. Jean-Fred (talk) 21:17, 13 January 2019 (UTC)
@-revi: captions certainly has some design issues that showed up in production that were not visible in testing. The team is working to identify the problems and push the fixes, I've made an update section that you can keep an eye on. Keegan (WMF) (talk) 17:57, 14 January 2019 (UTC)
It would be very useful to make a list of what was missed in testing, so that the problems we had here are less likely to be replicated in the future. - Jmabel ! talk 22:32, 14 January 2019 (UTC)
Oh of course, lists are being compiled and things will be shared. A majority of the issues following the release are bugs/design flaws that can only be surfaced from release into production; in most all software development there are limits to testing environments trying to replicate a live environment, with highly customized configurations, skins, javascript, css rules, abuse filters, etc. that come with all of the wikis.
All these excuses aside, a new testing environment that will be better at finding things has been in the works for the past few weeks, and should be up and live in time for depicts and other statements testing in a couple of weeks or so. In theory, it will be very helpful in reducing bugs in production. Keegan (WMF) (talk) 00:01, 15 January 2019 (UTC)

UploadWizard and captions[edit]

Hi, I just wanted to follow the new idea and do my best with the new uploads - there are few issues from this exercise:

  • current explanation in the form for the Caption is not enough - how it is different from Description? What IS most important and for whom? I seek for a link to a page with a good set of examples;
  • the section Caption is translated into Polish as "Podpis", which is a bit confusing, as suggests more a "Signature";
  • repetitiveness - first - Title, than - Caption, than - Description. All of them containing the same subset of information. In my case, with Description containing mostly one or two sentences (close to 255 characters), Caption could be the same (but two separate fields suggest that shouldn't?). Now I have to fill in 5 fields, if, as usual, two languages included; Some kind of auto-generated-pre-filled-suggestion-from-Description would be of much help;
  • Caption is marked as Optional, but put above the required Description;
  • There is no info that wiki markup should not be used in Captions and, as far as I've checked, it is not validated if it has been used, so published with the markup, than uninterpreted on the file page.

It rather stops from providing the Captions with upload at the current state. Nova (talk) 12:35, 13 January 2019 (UTC)

Captions should probably be with the "other information" if anything. Nemo 15:23, 13 January 2019 (UTC)
Thank you for the feedback, it's very helpful as the team looks at design changes. Keegan (WMF) (talk) 17:53, 14 January 2019 (UTC)

Captions updates to come[edit]

The development team is putting together the plan for changes needed for captions, there are a few bugs and some design issues that showed up when captions went live on Commons (and thank you to all who have pointed them out onwiki and/or participated on Phabricator). I'll have a list to post later this week, along with information about how soon we can expect to see the changes. I'll be making the post here, with a note on the Village Pump. Keegan (WMF) (talk) 17:51, 14 January 2019 (UTC)

Can better sitelinks help with structuring data?[edit]

I've opened a proposal at Wikidata at "Wikidata:Wikidata:Requests for comment/Proposal to create a separate section for "Commonswiki" links" in relation to linking to galleries and categories on Wikimedia Commons, I also left a field there open for other ways how Wikidata could help Wikimedia Commons with its structure, will the structured data for Wikimedia Commons project be able to utilise such links or does this project exclusively work with the files and not the existing community-made infrastructure? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 15:59, 15 January 2019 (UTC)

Hashtags (#)[edit]

A lot of image-sharing websites use hashtags (#) to clarify what is depicted in an image, is the concept of "depicts" going to be like this? Because I think that adding hashtags could very easily be a good search 🔎 tool, let's say you want to find an image with both "#Cats" and "#Birds" then a specific search could look for images where both of these hashtags are used. Sure vandals could wrongfully tag images with "#Hot sex" and "#Nude female human" and actual images that depict hot sex and nudes could be just as well be vandalised with "#Children". But from what I can tell "depicts" will be vulnerable to the same levels of vandalism. Hashtags could be listed below the categories on the bottom of an image, and being placed in a category could automatically add certain hashtags to an image, these "automatic hashtags" are then non-editable in the same way maintenance categories associated with certain license templates are. Is this idea viable or close to how "depicts" will work?

Because this is already how a lot of successful media-sharing websites do it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:26, 15 January 2019 (UTC)

Hashtags are not being implemented. Here's the last round of designs of what depicts will look like; it's very much about structuring data. Can someone add incorrect or malicious entries into structured data? Yes, but in no different way than the rest of the wiki functions. Structured data is fully integrated with recent changes and the revision deletion extension that controls page deletion, revision deletion, and suppression (aka oversight), as well as abuse filter. The development team plans on having depicts statements up for testing in a new test environment by the end of the month, so you'll be able to see it in action. Keegan (WMF) (talk) 22:17, 15 January 2019 (UTC)

Captions Vs. Descriptions[edit]

Alright, I really do not want to sound daft or anything, and I've already heard read an explanation, but I genuinely can't tell the exact advantage of "captions" over "descriptions", I get that "descriptions" are in WikiCode and can't be used with the infrastructure of the new "Structured Data", but other than character limitations I don't exactly understand how "captions" help with structured data, do they allow for data-mining ⛏ to automatically create "depicts" based on them? How exactly do they organise files in ways that "descriptions" don't? When I search for a file the file descriptions are also searched, so how do file captions improve upon this? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:40, 16 January 2019 (UTC)

Also, descriptions can be multilingual as well, the way adding different languages is done using the MediaWiki Upload Wizard is identical, how are the languages of file captions better organised than those of "descriptions"? I've seen the new upcoming designs and they all look great, but I still don't see where the advantages of "Captions" come to play. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:44, 16 January 2019 (UTC)

I think the next set of releases for structured data, depicts support followed by support for other properties, will help illustrate how captions fits in with the coming features, particularly on the back end of the software. Search with captions and statements, for example, will be an entirely different experience and it's very hard to illustrate that right now. The next question might be why wasn't captions released with this other stuff, then? The answer to that is the underlying technology behind captions that will power the rest of structured data - namely Federated Wikibase and multi-content revisions - is brand new and had to be integrated into Commons first, before we can finish development and release of the next feature set. Keegan (WMF) (talk) 22:35, 16 January 2019 (UTC)
Ah, well patience is a virtue then, but seeing how we now have 3 (three) fields for descriptive information ("the title", "Descriptions", and "Captions") I can understand the backlash. But thanks for explaining, although I still think that removing the character limit from "captions" (or at least synchronizing it with "descriptions'" 10.000 character limit) and then having a bot automatically copy all current "descriptions" to captions would've been preferable.
By the way, shall "depicts" and other planned structured data fields also be integrated with the MediaWiki Upload Wizard 🧙‍♂️? Or shall it not pass? And if so, is there concept art of this available? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:42, 18 January 2019 (UTC)
More will be integrated into the UploadWizard that will provide context for captions, though I do not have details or designs to share right now. More will be available in the next couple of weeks as the team moves past captions fixes and into depicts testing. Keegan (WMF) (talk) 21:05, 18 January 2019 (UTC)

Captions updates[edit]

Here's the list of work being done this development cycle to work on captions.

Done[edit]

These tasks have been complete and are either live on Commons now, or scheduled for deployment next week.

Doing[edit]

Works in progress, the fix being live on Commons is yet to be determined.

To do, but it's complicated[edit]

The underlying problem is going to take additional work.

Needs community fix[edit]

Captions introduced a bug into a Community-maintained gadget-space, unfortunately. The development team may able to advise volunteers on fixes where appropriate, but the team is unable to fix this themselves.

Needs community discussion[edit]

Two tasks can be implemented by the development team, but they require Commons community consensus to implement.

I'll post updates if/when I receive them. Keegan (WMF) (talk) 20:23, 18 January 2019 (UTC)

Can I switch this on and off?[edit]

The table with an input possibility for structured data (image one-liners) now appears above the image discription. Is there a way to switch this on and off, or to relocate the table to a lower position on the file page? Elly (talk) 20:00, 19 January 2019 (UTC)

"Depictes" (local Vs. Wikidata-based solutions)[edit]

Hello 👋🏻 y'all,

What are "Depicts" going to be? Are they going to be local items or will they be hosted on Wikidata? My idea 💡 would be for them to be locally hosted on Wikimedia Commons in a new namespace such as "Depict:Confucius" which could then link to the Wikidata page about "Confucius", one advantage this could have is that they wouldn't be called into question as much regarding their notability as Wikimedia Commons basically has no notability standards. If they're locally hosted they could be locally linked and connected with Wikidata, at "Wikidata:Wikidata:Requests for comment/Proposal to create a separate section for "Commonswiki" links" I requested Wikimedia Commons categories and galleries being linked at the same time to a Wikidata "item" and "Depicts" could then also be added on to that request, this would also enable translations from other Wikimedia websites to be seamlessly imported and puts minimal effort on the Community to maintain it. Thoughts 💭? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:49, 21 January 2019 (UTC)

Image added here for (visual) reference.

What I suggest for Wikidata would be link this (stole the design from Christian Ferrer):

In other projects
Wikispecies
Wikimedia Commons
Gallery
Category
Files depict this topic

And then information could be automatically imported, most English speakers don't know that "Pikachu" is "ピカチュウ" in Japanese, or that "Poké Ball" is "モンスターボール" (Monster Ball) in Japanese, but keeping it local would also allow non-notable (according to Wikidata standards) depicts to be hosted here such as Great Balls (スーパーボール, Super Balls), Ultra Balls (ハイパーボール, Hyper Balls), Master Balls (マスターボール), Safari Balls (サファリボール), Level Balls (レベルボール), Lure Balls (ルアーボール), Moon Balls (ムーンボール), Etc.. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 17:05, 21 January 2019 (UTC)

The basic idea of notability in Wikidata is, that any Wikiproject needs it. So if there is anything with many pictures of it, it is okay to create a Wikidata item for it. If there is only one picture of it there is no need to create an item of grouping the images. --GPSLeo (talk) 17:38, 21 January 2019 (UTC)
Now as an example from the Dutch-language Wikipedia let’s look at “w:nl:Bristol (winkelketen)” (Link 🔗 / Mobile 📱), which is related to the Wikimedia Commons category “Bristol (shop)”, its deletion log reads “Een pagina met deze titel is eerder aangemaakt en daarna verwijderd of hernoemd, zie hieronder: 10 jan 2008 10:41 MoiraMoira (overleg | bijdragen) heeft de pagina Bristol (winkelketen) verwijderd (cyberpestactie van werknemer belastingdienst) (bedanken)” which translates as that it was deleted because of “Cyberbullying by an employee of the Taxation Service”, according to their official website there are 210 Bristol stores in the Benelux which is much more than some other similar stores covered on the Dutch-language Wikipedia (nlwiki), on the English-language we have the “salted” article “w:en:Ulefone”, which is related to the Wikimedia Commons category “Ulefone”, which has these reasons for deletion:
However simple Ecosia searches for “Bristol (winkelketen)” and/or “UleFone” shows a large number of independent sources covering these subjects, notability standards on Wikipedia's are corrupt and can be easily manipulated by any admin by “salting” the pages and any user interested in writing about “a salted” subject will be discouraged from writing about it, furthermore I’ve seen countless of good neutral articles deleted because they were “promotional” or “spam”, but let's not delve too deep into the issues of other wiki’s and how local admins do influence what does and does not get covered as we have a similar situation with the interpretation of “Commons:Scope” here, the issue is that a Wikidatacentric approach will have to work with the standards of other Wikimedia websites, and although all other Wikimedia websites combined cover 54,055,172 content pages (as of 14:28 22 D. 01 M. 2019 A.), but some subjects will simply never be covered on the other Wikimedia websites, this can be because of a lot of factors other than just notability, admin interfere of content creation, overzealous “spam-hunting”, Etc. People could just not have an interest in a subject or all the information is either “burried/hidden away” in old newspaper clippings, just not online, it’s an obscure subject that only existed during the 1970’s in Botswana, Etc. There are many scenarios where something could be the subject of multiple photographs without it ever being covered on Wikidata, I don't believe that Wikidata could ever be the sum of all knowledge (not under Deletionism and Exclusionism, anyway) and sure the current category system solves this, but aren’t structured data “items” supposed to be independent from the MediaWiki category and organisational system? Having a system independent system for organising the content on the project (Wikimedia Commons) that works well together with Wikidata but has the space to be flexible enough to organise things outside of the scope of Wikidata and other Wikimedia websites would be a better option than outsourcing everything to Wikidata, it limits the ability to actually structure the data on Wikimedia Commons.
You simply can't walk through a major Dutch or Belgian city without seeing a Bristol shop, why should this subject not be noted when it's depicted while for example "w:nl:Scapino (winkelketen)" should. I'm not arguing that structured data shouldn't work with Wikidata, the point is that if something gets deleted off of Wikidata that is extensively covered here then we shouldn't let Wikidata "unstructure" the data here. Also Wikidata's notability doesn't cover any Wikimedia project, it specifically excludes Wikimedia Commons because Wikimedia Commons itself has no notability standards. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:03, 22 January 2019 (UTC)

Just use Wikidata if the data is needed across many entries, or write it out each time in the file data if it's not. If that's an issue on Wikidata, then that needs to be fixed on Wikidata. We don't need a second Wikidata here (or if you count structured data for files as a wikidata, then we don't need a third Wikidata). Commons is the media repository, Wikidata is the data repository. Thanks. Mike Peel (talk) 21:23, 22 January 2019 (UTC)

Categories aren't media either, they just list what's depicted in the media which is why some files could be added to a large amount of categories while others only to a few, wasn't structured data promised to not be simply a collection of links to Wikidata per "Comment The first two logos were originally created for WikiProjects that directly sought to make Commons and Wikidata work together. But I am not sure that such a strong emphasis on Wikidata is necessarily appropriate for the structured data project. Wikidata can divide people; in particular, Structured Data is not a Wikidata "take over" of Commons; and in any case, much of the structured data will not be stored on Wikidata. So I would suggest a logo which does not explicitly reference Wikidata in this way. Instead, I am rather taken with the idea of the Structured Data bee. I would suggest putting the Structured Data Bee in the middle of the Commons logo, with the arrows reversed, so that the Bee is feeding Commons, which in turn is feeding the world. Jheald (talk) 23:01, 30 October 2017 (UTC)" from the page "Commons talk:Structured data/Get involved/Community focus group#How do we feel about a project logo?". Don't get me wrong, a better integration with Wikidata is vital for cross-wiki compatibility, but there is a reason why Wikidata has consistently worked for every other Wikimedia website and not for Wikimedia Commons, local problems might sometimes need local solutions. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:54, 22 January 2019 (UTC)
You realise that Wikimedia Commons is now one of the main users of Wikidata, right? Thanks. Mike Peel (talk) 22:23, 22 January 2019 (UTC)
Yes, for topics which are already on other Wikimedia websites, there is a reason why I can link "Category:Elephone" but I can't link "Category:Oukitel" "Category:Ziengs Schoenen" to Wikidata, Wikimedia Commons simply covers more subjects due to lower notability standards, only allowing users to tag what is depicted if it already exists on Wikidata does a major disservice to potential re-users who are looking for "less notable subjects". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 22:29, 22 January 2019 (UTC)
Really? I just linked it to Ziengs (Q60887513). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:39, 23 January 2019 (UTC)

Question, Why?[edit]

Sorry but you must be kidding. Why can't a bot copy the English description into your structured data field? Going back 10 years and changing these by hand is going to introduce errors that would be eliminated by a bot copy.

  • "Summary" contains the field "Description English:" followed by a caption.
  • "Structured data: Captions: English" is now a blank on all of my ~1500 past uploads.

You're going to have to give me a better rationale for taking the time to do this than I have seen so far. -SusanLesch (talk) 20:37, 22 January 2019 (UTC)

@SusanLesch: captions comes with an API that lets the community build bots and tools to automatically or semi-automatically copy information into captions. As captions are brand new, and there are more features coming to structured data in the coming weeks and months, these tools haven't been written yet, but they can hopefully be expected in the near future. You don't need to fill in captions by hand if you choose not too, time is a limited resource. If you don't want to see captions on the file page until they can be filled in, or at least collapse the box, gadgets are available for this. Keegan (WMF) (talk) 22:01, 22 January 2019 (UTC)
Thank you, Keegan (WMF). I don't mind the clutter as long as it has a purpose. -SusanLesch (talk) 22:06, 22 January 2019 (UTC)

It is worth observing that due to copyright issues, such as database rights, if the Caption is used as a CC0 database without mirroring the original image page license, then mass bot copying partial descriptions should be blocked or reverted. Without this we may well see later take down requests, as I have experienced with my own use of description texts during upload projects. Anyone thinking of writing a bot task, must have an associated community discussion and case review. -- (talk) 12:49, 23 January 2019 (UTC)

The solution is simple, add a standard message text that States "this file caption has been imported from the file description and falls under the Creative Commons Attribution-ShareAlike License; additional terms may apply." and don't use this tag for captions added using other methods or for captions added after a certain date, this should solve the copyright issues for re-users. But from what I've read Mike Peel wants to create an RfC about this so the copyright could be taken into account. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:55, 23 January 2019 (UTC)
That really does not look like a solution, if captions are intended to be used as a database for reuse in other places. Neither does it solve issues such as texts taken from sources with other license restrictions like OGL or even text extracts from non-free/unclear sources where the image was taken as public domain. Once the caption becomes dislocated from the rest of the image page text, these complexities become active risks.
I am also deeply concerned at seeing the much repeated myth that "facts" cannot be copyrighted. As an example "facts" like artist or creation date on historic artifacts are frequently subject to dispute and varied expert opinions and judgements. Mass processing this sort of text as if they can have no creativity or subjectivity is misleading and is likely to represent a copyright risk. The law in this area is highly varied between countries, please stop talking about copyright as if only the opinions of a few USA pundit lawyers matters. -- (talk) 13:02, 23 January 2019 (UTC)
I agree with Fæ, if the captions are CC0 than you can not have some messages stating otherwise. --Jarekt (talk) 14:34, 23 January 2019 (UTC)

The problem is that this feature is useless, as it duplicates an existing feature (namely, description tags). This will only serve to confuse readers as to why there are two different areas with potentially different captions. Either have one or the other—not both. — pythoncoder (talk | contribs) 13:18, 5 February 2019 (UTC)

Copyright modeling on Wikidata[edit]

Starting point

As I look into the future for our SDC feature releases, we have copyright and licensing statement support releasing in the next few months or so. It's probably a good time for interested Commons community members to start looking into how Wikidata setting up copyright modeling and ontology. Jarekt has started a help page for Copyright on Wikidata, and I encourage community members to have a look over and participate on the talk page if anything comes up that needs discussion. The development team doesn't hold a preference as to how modeling is implemented, but they would like see as much participation from Commons with Wikidata as possible before a final structure is released. Keegan (WMF) (talk) 20:58, 22 January 2019 (UTC)

On that note d:Help:Copyrights at the moment covers Public Domain cases, but I do not think we figured out how to model copyrighted items, like files with {{Own}} and CC-BY license. We need more discussion on how to best do that. But lets keep the discussions on one place at d:Help talk:Copyrights. --Jarekt (talk) 21:16, 22 January 2019 (UTC)
Took me a bit of digging, but look what I found. I think it can serve as a good starting point to improve on. I'll share it on the copyrights page too. Multichill (talk) 12:15, 26 January 2019 (UTC)

How to edit a caption[edit]

Very elementary question. I wanted to edit the caption of one of my pictures but don't see how. There is no Wikidata item in the left-hand column. COM:Captions goes to Commons:Timed Text which is irrelevant. Help doesn't help me. So, where's the beginner's help? Jim.henderson (talk) 01:40, 25 January 2019 (UTC)

Good question, I don't see any way to edit existing captions at present. --ghouston (talk) 01:54, 25 January 2019 (UTC)
Don't y'all get this screen?
I can edit file captions, which is odd because as a mobile-only editor I'm usually excluded from editing functions. Do y'all see this screen or not? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 07:11, 25 January 2019 (UTC)
Odd, a file like File:Kai Xi Tong Bao (開禧通寶) - IRON 2 Cash, Rev. Tong & 1. (S863) - Scott Semans.jpg has a caption that can be edited, but a file like File:Joseph_von_Fraunhofer,_engraving_by_Christian_Gottlob_Scherff.jpg has a caption that can't be edited. --ghouston (talk) 09:13, 25 January 2019 (UTC)
Very odd indeed, I experience the exact same issue, now I'm curious what the differences are between these files, note that I added the caption on the Chinese cash coin using the MediaWiki Upload Wizard. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 09:29, 25 January 2019 (UTC)
@Jim.henderson, Ghouston, Donald Trung: Sorry about this, there was a caching glitch caused by the roll-out of the software. It will automatically fix itself over time, or when the file's page is purged, or edited. Jdforrester (WMF) (talk) 16:02, 25 January 2019 (UTC)
I encounter similar problem since 2-3 days on file pages - the H2 header of description section is not displayed. To edit the description I need to edit the whole page. The same on old and on new pages (File 1). When I add a caption and reload the page the section's heading appears (File 2). I've played around trying to make the header appear without adding a caption, but failed. I have the "collapse caption" gadget on, but switching it off doesn't help. Nova (talk) 19:18, 26 January 2019 (UTC)
Works now, thanks. Nova (talk) 12:25, 27 January 2019 (UTC)

Are file captions also downloaded when someone downloads a file?[edit]

Are there also plans to let users download file captions as part of a file when they download it? Kind of like how Microsoft's Windows (Live) Photo Gallery allows users to add whole stories to pictures describing their content which is then saved as a part of the metadata. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:37, 26 January 2019 (UTC)

Just to clarify: you would expect the captions to be downloaded as EXIF tag or a similar metadata format? Jean-Fred (talk) 10:58, 26 January 2019 (UTC)
This would alter the SHA1 checksum for the file, a bad idea as that is the primary means of testing if uploads are identical duplicates. -- (talk) 11:26, 26 January 2019 (UTC)
Presumably, the SH1 used for that use case is calculated server-side, so before the on-the-fly addition of any metadata. Also, the current state is that license and author are already added to downloaded thumbs. Jean-Fred (talk) 11:50, 26 January 2019 (UTC)
You may be conflating things. Thumbs display stuff on transclusion, which is not the same as downloading. You can download any thumb size you wish, but no extra metadata is added to it during the rendering process. Calculating SHA1 is the same whereever it is done. I use both the Commons API to return SHA1 values for hosted files, and I use standard Python hashlib to do the same to local files before an upload or after an upload. Other websites, like DVIDS, automatically edit EXIF data when they upload photographs to their website. It is an amazingly destructive thing to do, as it is then impossible to do any automatic, processing cheap, comparison to images which may be identical on other sites. In the DVIDs example, this means that Commons is plagued by duplicates of the same image from DVIDS, Flickr and specific military alternate sites where the EXIF is different in every location.
Obviously, this is something that can be done, but there are big bad implications that mean it really should not be done. -- (talk) 12:19, 26 January 2019 (UTC)
“No extra metadata is added during the rendering process” → That certainly used to be the case (see eg phab:T44368) but indeed, does not appear to be current anymore − maybe something deprecated in the move to Thumbor (selected EXIF fields are still preserved in Thumbs though).
I am aware of how file hashing works, and I appreciate the challenges you are facing when operating with third-party websites. I still fail to see how, were captions to be included in file downloads, this would interfere with the upload process. My understanding is that you avoid uploading duplicates by retrieving the SHA1 from the API (so before anything would be added), and compare it to your local files (which by definition, would not have captions added to them) − I don’t really understandand the usecase for downloading a file from Commons after the upload and then calculating its hash locally. Jean-Fred (talk) 13:14, 26 January 2019 (UTC)
I feel odd now, it seems so obviously bad I thought it was intuitive. If we tamper with EXIF data based on volunteer added metadata, then there will be no easy way for other people to compare, say IA or Flickr files against the versions of the same files hosted on Commons. Even worse, people will download the file in 2019, and share it elsewhere, then in 2020, some well meaning volunteer will upload the same file from a collation site like Europeana, and because the caption has been altered over time, the API will fail to recognize it is a duplicate.
The comparison of SHA1 values is the most fundamentally easy way to avoid duplicates. If we screw around with that simply because people quite like playing around with adding junk to EXIF data in vague non-standard ways, then pop, Commons pollutes its own future with random duplicates. -- (talk) 13:24, 26 January 2019 (UTC)

(Over)depictization[edit]

Example

I'm a bit lost on what the current proposal is on how to use depicts. I found this feedback requests with it's talk page, but that's been inactive since October last year. Keegan: any pointers? I called this topic (Over)depictization because it relates to the topic of Over-categorization. Back in 2011 I wrote this proposal for next generation categories:

  • Multiple languages - We're using Wikidata items so that's covered
  • Enrich relations - On Wikidata we can link items together with all sorts of properties so that's covered
  • Efficient intersections/searching - This topic is about this

Let's take the painting on the right as an example. It depicts Vincent van Gogh, but it also depicts a human, a man, a painter, etc. If you look at the category you'll find interesting categories like Category:Three-quarter view portrait paintings of men, facing left and Category:Men facing left in art. I think the first step is to break it up into atomic (as in, not intersected) items like "Three-quarter view portrait"(?), "men" and "facing left". These could be added to the item. But how to deal with implied properties? If it depicts Vincent van Gogh, it depicts a human and it depicts a man. If I search for "man", this image should be included somewhere. So we have two ends to that scale:

  • Manually add implied depicts. We got complete overdepictization. This is a lot of user work. Imageinfo items will contain a pile of depicts statements. It does make search easier (from a technical perspective).
  • Remove any implied depicts. We got no overdepictization. This saves the user a lot of work and it's the same concept of pushing files down the category tree. The number of depicts statements is limited. The big downside is the same as with the current category tree: It makes it harder to find things using search. The search engine should be improved to also include implied depicts statements. This isn't easy.

So the key to success here is the search engine. I already seen some chatter about implicit depicts and how difficult it is. The subclass tree on Wikidata will never be perfect. We can (and will) improve it, but we have to work with an imperfect model because we live in an imperfect world. Let's take that as a given (premise). Trying to make it perfect is like fighting windmills. So we should be able to adapt the search engine to use this imperfect data. We might notice that for certain domains going 8 levels up in the subclass tree is just fine, but for other domains it produces complete nonsense. We can't expect the WMF engineers to do this tweaking. From a technical perspective the search engine should be updated to include implicit depicts and what is included (or not) should be configurable here on Commons. That way the Commons community can tweak and improve the search results. I would imagine a user interface where you see the depicts properties and slightly greyed out the implicit depicts properties. I wonder what the plans are from the development team. Multichill (talk) 13:03, 26 January 2019 (UTC)

  • Yeah. I mentioned in a previous discussion that in principle an image of a human should also be tagged with things like Homo sapiens, primate, mammal, animal, and organism, because there's no sense of inheritance in the proposed depicts setup. Of course it's unlikely that somebody tagging manually would remember to add all these extra topics, but a bot could easily do it. It gets a bit silly, but why would you stop somebody adding such topics, when they are correct and logically required? And then I suppose somebody will notice that searching for "animals at Guantanamo Bay" comes up with a result of human prisoners there and will make a big fuss about it. Ambiguity of language, what can you do? The other problem is that sometimes you have a large number of similar photos, perhaps all taken on the same day, all with the same topic. Currently you can group them in a category and categorise the whole lot at once, but with depicts, every individual image will be a search result of its own, potentially swamping any other images of the same topic. --ghouston (talk) 21:49, 26 January 2019 (UTC)
  • The point in this case isn't about whether humans are animals or not, but where that relationship should be stored. If the search engine was traversing subclasses, then a search for animals would include humans (if the Wikidata subclassing was set up properly). With no traversal, "animal" needs to be added to each file individually. With the existing category system, we have Category:People which is ultimately a subcategory of Category:Animalia, but the system is quite erratic, since we also have Category:Animals with a whole category tree that apparently means "non-human animals", but it's nowhere stated as such. --ghouston (talk) 23:42, 26 January 2019 (UTC)
I do not think we should tag painting of van Gogh with tags like "Homo sapiens, primate, mammal, animal, and organism", by hand or by bot. To many depict statements dilute their usefulness and we should only use the most specific term. If that hampers current search engine than we should concentrate on improving it. --Jarekt (talk) 18:05, 28 January 2019 (UTC)
I hope very much.... otherwise the result of all this would have been the creation of a tag namespace... which is not necessarily bad but it was probably easier to get there... with the creation of such a namespace... :( Christian Ferrer (talk) 19:22, 28 January 2019 (UTC)
We could add only the most relevant property, e.g., Vincent van Gogh and not Painter. But then that image won't turn up in any search for painters. We could add Carduelis carduelis to a picture of a goldfinch, but then it won't show up in a search for birds. --ghouston (talk) 21:56, 28 January 2019 (UTC)

One way to deal with this is to just introduce more properties than depicted to describe item content, such as "painting format" to indicate if a painting is rectangular or oval or cathedral shaped, and whether the orientation is portrait/format. Another one could be "portrait style" for specifically portrait paintings, where you can indicate bust, half-length, three-quarter-length, seated, etc. Ñothing says we need to translate categories into the depicted statement. This is not a 1-1 relationship by any means. Jane023 (talk) 07:49, 30 January 2019 (UTC)

  • The relevant thing to me is that if we introduce "shadow tags" in the search cache, or whatever we want to call them, they need to be tied to specific primary tags on the image. So that if somebody tags that an image depicts Van Gogh, with qualifiers that he is depicted in a particular part of the image, or in a particular pose, or wearing a particular hat, those qualifiers also need to be discoverable for a search run on the more generic topic.
We also need to be able to search for a painting that depicts 4 humans, without getting confused as to how many humans there are in the picture if some but not others can be named. Similarly for a painting with 4 animals, if some are identified as lion and lioness but others are not. Jheald (talk) 18:32, 30 January 2019 (UTC)
  • Does the depicts search engine optimisation (SEO) only concern Wikimedia search engines or also third party search engines (such as Ecosia, Microsoft Bing, Google, Baidu, Etc.) because if the former is true then we can always embed the file depicts to their subdepicts ans be able to filter out the specific dubdepicts. Let's say if you search "animal" all humans, cockatoos, and herrings get depicted but then you can filter out what animals you do and don't want to have, individual files in this case then would only need the most specific depict, let's say that Vincent van Gogh is a "Male Dutchman" which is a sub-category of "Dutchman" -> "Human" -> "Homo Sapiens Sapiens" -> "Homo" -> "Hominini" -> "Homininae" -> "Hominidae" -> "Simiiformes" -> "Haplorhini" -> "Haplorhini" -> "Primates" -> "Mammals" -> "Animals" -> "Filozoa" -> "Holozoa" -> "Opisthokonta" -> "Abazoa" -> "Unikonta" -> "Organism", listing each and every one of these would constitute overdepictisation but if only "Dutchman" or "Male Dutchman" (or more specific "Male Dutch painter from the 19th century") then a search for "Organism" would include this image as well as a search for "Mammals" as the depict-trees could just embed parent depict categories into child depicts. In fact I would probably state that for notable individuals such as Vincent van Gogh a "Vincent van Gogh" depict would be sufficient for listing all of his attributes such as "Male human", "Painter", "Person from the 19th century", "People born in Zundert", "People who died in Auvers-sur-Oise", Etc. And a search for each of these masterdepicts return these subdepicts, but not vice versa. This would optimise both specific and unspecific searches. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:52, 3 February 2019 (UTC)

@Multichill, Christian Ferrer, Ghouston, Jane023, Donald Trung: sorry for the delay in response, I was offline last week. Search is going to be crucial, particularly granular search. The process of getting there is going to be granular as well, the release plan is going to take a "crawl, walk, run" approach and full-feature search is definitely running. Very tentatively for depicts, we're looking at releasing in the order of:

Add/view/edit depicts on file pages
Add/view/edit depicts in UploadWizard
Search depicts statements
Depicts qualifiers
Filter search results
Depicts of depicts
Depicts and annotations

So yes we are going to get to the place that's being suggested, the software will potentially largely be able to handle whatever the community throws at it to serve back to the user as appropriate. It's going to take steps, though, and ones that will be repeated to add other statement support after depicts support, to handle the use case that Jane023 suggested. Releases for depicts support are starting this month, we're sorting out when testing for the first stage will be available. I'll be advertising that everywhere as soon as I know. Keegan (WMF) (talk) 18:13, 4 February 2019 (UTC)

Structured data on Commons presentation reports[edit]

Heya,

I have created two reports from the presentations of this project to Wikimedia Commons contributors. Feel free to proofread them, move them, etc.:

Juandev (talk) 16:12, 28 January 2019 (UTC)

How would structured licensing look like for screenshots?[edit]

Hi! I recently took a look on how structured licensing statements will look like in the future. Commons:Screenshots is somewhat clear on what statements I should add to a file. One example is a file I uploaded is File:Paris wikipedia app android.jpg. Which licenses would show to the user? If I added Template:Free_screenshot/en, would wikidata show that information? Thanks! Tetizeraz. Send me a ✉️ ! 22:50, 3 February 2019 (UTC)

Caption - confusing behaviour[edit]

When I saw caption segment for the first time on Wikimedia Commons, I was confused by its behaviour. I would click into the field where "Add a one-line explanation of what this file represents" is written and start editing the caption in that proper language. It take me some time to find that pencial icon and figure out how it works. Would it be possible to open the input box just by clicking on the description (probably not I guess) or move that pencil icon down to that line or do something else with the design to be less confusing? Juandev (talk) 16:54, 4 February 2019 (UTC)

Our designer may take another look at the pencil icon placement once the work around depicts and some other things finishes, it might not be ideal. It might take time to get back to it, though. Keegan (WMF) (talk) 18:20, 4 February 2019 (UTC)

Can file depicts help train artificial intelligence?[edit]

Soon structured data on Wikimedia Commons will manifest itself with “Depicts” which allows people to categorise images with tags that link to Wikidata pages. These “Depicts” are designed to list every subject depicted in the image and reminds me a bit of how Google Photos works, Google Photos is an online service which can categorise every image in its database by which subjects are depicted in it with mixed success. Because it relies heavily on bots that group images together into categories and is able to search 🔎 images based on what's depicted in them Google Photos is actually quite advanced in how it automatically organises videos and images. Now Google Photos as a service can be improved by humans by letting the software 👩‍💻 via the website "Crowdsource.Google.com" where human beings can help detect certain subjects and give feedback if they are or aren't in a photograph by going through categories. Now file depicts can potentially help software programmes like those in Google Photos to help recognise certain subjects as then us Wikimedia Commons volunteers will add tags as to what subjects are depicted in a photograph.

Now if let's say a company trying to develop image recognition software wants to utilise the data generated through file depicts, could they realistically utilise this data to help train their software? Of course this is a question for years in the future as most files won't have any depicts during "the transitional period"/"the introductory period", but as a long time goal, would they actually be useful for such purposes? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 02:13, 5 February 2019 (UTC)

  • I would think that (in theory) it should be roughly exactly as useful as categories. Whichever approach does a better job will be that much more useful to AI. - Jmabel ! talk 03:26, 5 February 2019 (UTC)

CC0 licensing mockups[edit]

Here are the designs for notifying contributors of licensing for captions/the structured data portions of Commons. The text has been signed off by the Wikimedia Foundation's legal department, and the function is heavily influenced by how Wikidata serves a licensing popout.

File page[edit]

The file pages works like Wikidata: a user is served a popup informing them of structured data licensing when entering the edit mode, and accepting the information will permanently dismiss the notice.

FilePage CC0 message.png

UploadWizard[edit]

Since the UploadWizard doesn't have an edit mode, the licensing for captions and structured data will be shown in the Release Rights step.

Discussion (mockups)[edit]

Please let the team know of any questions or concerns you might have about these designs, so that we can attempt to address them. These notifications are going to go live here on Commons once they are coded up and ready to go. Keegan (WMF) (talk) 20:54, 5 February 2019 (UTC)

For UW, the text says "By clicking 'publish' you agree to …", but the button you have to click is actually labelled "next". That's a bit confusing if you ask me. --El Grafo (talk) 10:19, 6 February 2019 (UTC)
+1, furthermore the captions are not mandatory... and you can upload a file without caption.... so it is really confusing, at this place and in this form. IMO, in the UploadWizard, the text should be at the next step and just below the caption field. Christian Ferrer (talk) 12:28, 6 February 2019 (UTC)
Thank you both, very helpful. Other observations are still appreciated. Keegan (WMF) (talk) 18:11, 6 February 2019 (UTC)
I don't find the boxes confusing, but I must concede that they would have to be as clear as possible for everyone. However I oppose placing the licensing information anywhere later in the MediaWiki Upload Wizard as it will clutter up the file information fields, the second page of the MediaWiki Upload Wizard is literally called the licensing page and the above screenshots make the most sense. Alternatively at the bottom of every page on Wikimedia Commons there is a text which reads "This page was last edited on 6 February 2019, at 12:17. Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy.", this text could also be expanded to include Structured Data for Wikimedia Commons. Anyhow I dislike the idea of adding the license on the third (2) page of the MediaWiki Upload Wizard, even placing it under "information" would be confusing for some people or might even be seen as "deceiving". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:13, 6 February 2019 (UTC)
Sorry, do we actually want the captions to need to be CC0 ? Did the community decide this?
The consequence of this is that we won't be able to create captions directly from descriptions. Is this the right call? Jheald (talk) 18:48, 6 February 2019 (UTC)
@Jheald: RIsler (WMF) may speak to the product decision on captions licensing being CC0; my understanding is that CC0 is a necessity for the feature as part of the ecosystem of the structured data project. As to your concerns about how some simple descriptions can or may be copied into captions, I believe @Mike Peel: is planning a Village Pump/Proposal on the topic, as how to handle statements of fact will be a community content policy decision. The best I can do is repeat the opinion of the Wikimedia Foundation Legal Department, which you have previously stated your disagreement with:

“The text on Commons is CC BY-SA currently, to the extent it's protected by copyright. However, most captions are not likely to be copyrightable, since they are short and factual. Copyright protects creative works of expression, and not the underlying ideas. It's possible that there will be some descriptions that are so idiosyncratic to get Copyright protection, and Commons has a couple of options if that comes up: 1) argue that they are not copyrightable, 2) just remove those captions and write a better, CC0 caption. Either choice would be guided by how the Commons community would like to settle it on Commons:File captions.”

Keegan (WMF) (talk) 19:20, 6 February 2019 (UTC)
To add to what Keegan wrote above, from a product ecosystem standpoint we do want SDC's license to be compatible with Wikidata to enable smooth cross-project usage and avoid a lot of headaches (ex: in the future perhaps people will want to import Wikidata labels into image captions in cases where there are depicted objects, people, etc.) RIsler (WMF) (talk) 20:08, 6 February 2019 (UTC)
As I explained above, you are grossly misinterpreting what legal said. - Alexis Jazz ping plz 19:08, 11 February 2019 (UTC)
To confirm for the Wikimedia Legal team, I have been and I still am working with the team on this project. Text can be licensed CC BY-SA to the extent it's protected by copyright, and copyright protects creative works of expression, and not the underlying ideas. Names, titles, slogans, and other short phrases are not protected by copyright if they do not contain a sufficient amount of creativity. Unprotectable phrases can be used under CC0 even if uploaded under CC BY-SA. Stephen LaPorte (WMF) (talk) 01:06, 13 February 2019 (UTC)
@Slaporte (WMF): Thank you for coming back to us. But these are not reusable names, titles, slogans, phrases we are talking about, they are descriptions written specifically to describe the images, reflecting choices about what wording to use, what aspects of the image to include or not to include, and often also knowledge, skill, and judgment about the object depicted. (For instance, User:Pigsonthewing's coin example diff).
Secondly, does your answer represent a worldwide perspective? It's not so long ago (perhaps a decade) since the governing standard in English law was still Walter v Lane, a very low standard of originality indeed; and in Australia Telstra Corporation Ltd v Desktop Marketing Systems Pty Ltd. Have you considered the law outside the United States?
Thirdly, there's the question of accumulation: even if copyright law allowed the stray reuse of one caption to be de minimis, the taking of 1000 descriptions, or ten thousand, or a million, is an appropriation of a substantial body of work. Have you considered this?
Fourthly there's the question of database right. On the face of it, European contributors could justifiably argue that the totality of their descriptions represent a database in which they have invested considerable time and effort, of which appropriation for Commons wikibase represents an unauthorised taking, because Commons wikibase does not impose the "share alike" condition.
Does WMF legal have views on these points? Jheald (talk) 10:36, 13 February 2019 (UTC)
The example in question is "A silver hammered penny of Edward the Confessor, minted in Southwark between 1042 and 1044. Moneyer: Wulfwine." As I said in the previous discussion; I remain to be convinced that that is copyrightable. It does seem to meet the test "short phrases [that] do not contain a sufficient amount of creativity" (note that "research" is not "creativity"). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:24, 13 February 2019 (UTC)
Please avoid turning a passing comment into a "test". This is highly misleading and WMF Legal have not given any protection against damages for volunteers. EU law, UK law, USA law is very clear, it is entirely possible for "short phrases" to be copyrighted and it is very, very, clear that database rights do not vanish simply because you cut up metadata and paste it piecemeal into one sentence long fields on a website with a copyfraudulent CC0 claim. -- (talk) 11:58, 14 February 2019 (UTC)
@Slaporte (WMF): Could WMF Legal please separately publish a statement of immunity for volunteers that copy text into the CC0 captions field from original sources with both/either moral rights and database rights in force? Both of these copyright concerns apply to our imported files on Commons and volunteers (as I see it) are already risking claims of damages. This should be fine and dandy if WMF Legal are the ones who become liable, because you gave us (unpaid volunteers) legal advice which we can easily understand and reference. If you are not prepared to do this, please make a definitive statement so that volunteers have no doubt about the financial personal risk they take should they choose to copy, cut and paste or automatically populate text into the captions field without first assuring themselves that their source has a CC0 release. As you are well aware, under EU and UK law that applies to many of us writing in this discussion, database rights apply to systematic mass copying of even simple data, these rights are not obviated by hosting Wikimedia web pages in the USA. Thanks in advance. -- (talk) 10:59, 13 February 2019 (UTC)
I also found the CC0 boxes in the above images confusing. It should be clear that CC0 license relates to the SDC so it should be somewhere near SDC. I agree that SDC need to be released under the same license as Wikidata, otherwise it will be hard to use them together. I do not think that is something we want to be proposing and debating. We need to make clear to users that SDC is CC0 and the community should be deciding about bot tasks for filling SDC based on the file descriptions. Such bot tasks need to be evaluated based on if the data copied is eligible for copyright. Other than free-text descriptions / captions most image metadata (source, date, author, copyright, camera type and setup, etc.) are not eligible. --Jarekt (talk) 20:57, 6 February 2019 (UTC)
Since years we have been arguing that paraphrasing and taking data only from copyrighted sources is not an infringement of copyright. Big parts of Wikipedia is based on cannibalizing 3rd party descriptions, image descriptions too are often small snippets of text taken from somewhere else. Now, as our texts and our descriptions are going to be cannibalized by SD, wikidata and similar, there is no reason to complain about. The way to go with CC0 is more than logical from the WMF POV. The importance of community contributions is diminished along with the vanishing community. best --Herzi Pinki (talk) 18:38, 7 February 2019 (UTC)
The file description page popup is confusing, in that the first paragraph uses "you" and "your" (2nd person), and the second switches to "I" (1st person). Please be consistent.   — Jeff G. please ping or talk to me 13:13, 11 February 2019 (UTC)
  • This is all well and good, but doesn't actually address the issue that all captions added up until the point that this is effectively implemented aren't actually licensed under CC0. GMGtalk 13:23, 11 February 2019 (UTC)
  • @GreenMeansGo: the advice from WMF Legal above tells us that if a caption is a non-creative, i.e. a simple description, they can be re-licensed as CC0 as CC-BY was never a valid copyright claim in the first place. This is similar in concept to Template:PD-ineligible. I'm working on getting WMF Legal to reiterate this here on the wiki. Keegan (WMF) (talk) 17:43, 11 February 2019 (UTC)
  • Yes, but that is not a rationale that can be applied en masse based solely on character count. If a tweet can be copyrighted (e.g., [6] [7]) at originally about half the length, then our 255 character captions certainly can be also. And there's really no way to tell which are substantially creative and which are not without manually reviewing them all, that is of course, unless you have an active CC0 dedication. GMGtalk 17:56, 11 February 2019 (UTC)
All of these screenshots show too much clutter and confusion. New users will definitely be confused and think they have to release their uploads as CC0. Captions in the UploadWizard should be disabled by default. Users could enable captions in their preferences and when they do, you confront them with the license accept window. For users who don't do that, they will have to enter captions after uploading or wait for someone else to enter captions for their uploads. As this is after the upload, there will be much less confusion over what is being licensed. - Alexis Jazz ping plz 19:04, 11 February 2019 (UTC)
"Your contribution" is nowhere near clear enough. The text needs to explain precisely what the user has to release under CC0. The heading is slightly helpful, but anyone who's dealt with legal documents knows that headings are non-binding. --bjh21 (talk) 20:22, 11 February 2019 (UTC)
Hello everyone. Thanks for your input in this discussion. With WMF legal's help, we've begun implementing the CC0 structured data notices based on the feedback here. Today, you'll find a new popover explaining the terms when you attempt to add a caption on the File Page (clicking "Accept" will make the popover go away for subsequent edits), and there is a new notice about CC0 and captions on the Describe step of UploadWizard. Updates to the legal text in the site footer are being translated now and we expect them to appear on the site next week. Thanks for your help and understanding. RIsler (WMF) (talk) 01:50, 13 February 2019 (UTC)
@RIsler (WMF): this is good. But what will be done about the captions that were entered before the popover? - Alexis Jazz ping plz 18:11, 13 February 2019 (UTC)

Copyright notice at the bottom of all pages[edit]

As I posted at Village Pump/Copyright[8], the text at the bottom of pages on Commons is also being modified to clarify licensing obligations[9]. Keegan (WMF) (talk) 18:31, 7 February 2019 (UTC)

Y'all forgot the GNU Free Documentation License (which is still giving Alexis Jazz nightmares), as every edit you make in desktop mode has the message "By saving changes, you agree to the Terms of Use, and you irrevocably agree to release your contribution under the Creative Commons Attribution-ShareAlike 3.0 license and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license." (note 📝 that this isn't visible on mobile). I am not saying that the GFDL should apply to structured data, but it's currently not mentioned on the bottom of any pages which seems odd as every edit saved in "desktop mode" mentions it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:01, 12 February 2019 (UTC)
@Donald Trung: right, GFDL. Gotta put another bullet in that. 718smiley.svg - Alexis Jazz ping plz 14:32, 12 February 2019 (UTC)

Statusupdate[edit]

The "Latest updates" section is considerably outdated, so are some other sites. Could someone please update it? -- Michael F. Schönitzer 13:47, 6 February 2019 (UTC)

I'm going to take a look at some updates to Latest, I've been waiting to see what information about depicts I can put in as soon as I know more about testing and release. Some of the older outdated pages, like Development, are likely to remain stale. Development plans have both moved too quickly and changed too much to keep that page updated in any meaningful way. I'll see what I can find. Keegan (WMF) (talk) 17:32, 6 February 2019 (UTC)

Do we want to bot-copy descriptions to captions?[edit]

I've posted a proposal about this at Commons:Village_pump/Proposals#Do_we_want_to_bot-copy_descriptions_to_captions? - comments appreciated! Thanks. Mike Peel (talk) 21:45, 8 February 2019 (UTC)

Mobile glitch? (repeated file captions)[edit]

Is anyone else seeing file captions double on "the Mobile 📱 version" of Wikimedia Commons? I noticed this recently but I don't see it using "Desktop view". --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 17:24, 10 February 2019 (UTC)

I notice the same behavior than you. Please note that the problem seems to exist only for the English caption, not for the other languages. --Dodeeric (talk) 18:04, 10 February 2019 (UTC)
Correction / precision to my previous answer: the caption which is shown in double is the one with the same language as the interface. If your interface is in English, it is the English caption which is shown in double; if the interface is in French, it is the French caption which is shown in double, etc. --Dodeeric (talk) 08:04, 11 February 2019 (UTC)
That's some rather odd behaviour, we should probably file a Phabricator ticket for this. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:09, 11 February 2019 (UTC)
Hello. We are aware of these issues and are working on them, and you can track progress on this Phabricator ticket RIsler (WMF) (talk) 20:29, 11 February 2019 (UTC)
Thanks, that saves us all some time. Keep up the great work developing these features. Face-wink.svg --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 07:52, 12 February 2019 (UTC)

Change caption to title and add a separate description property[edit]

Due to the discussion about bot-filling the captions with the description. I think it would be better to have a title in the structure data, witch could be filled by the current titles of the images. And a description, potentially filled by the current descriptions. --GPSLeo (talk) 20:51, 10 February 2019 (UTC)

@GPSLeo: On the surface this is actually a good idea as titles like "File:Maerten van heemskerk, tauromachia nel colosseo in rovina, 1552, 01 (cropped).jpg" convey quite a bit of information, but then you have files like "File:100 0916.JPG" which are very useful images, but just less than informatively named. I can remember somewhere in the Structured Data on Wikimedia Commons discussions there being a few advocates for replacing image titles with Wikidata'esque Q numbers, but this was immediately shot down. But file titles, file descriptions, and file captions are all just fields where information is placed explaining the content of the file, theoretically they could all be interchangeable, but in practice some users develop a greater affinity for one over the other and their contributions reflect this accordingly. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:31, 11 February 2019 (UTC)

Proposal to "Fix the captions licensing"[edit]

For anyone interested please see the ongoing discussion "Commons:Village pump/Proposals#Fix the captions licensing" started by Alexis Jazz regarding the legal copyright © status of file captions. Anyone is free to give their two (2) Euro-cents on the matter. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:49, 11 February 2019 (UTC)

Bizarre. Does the editing interface for the captions not reuse Wikidata's, which even includes a popup to warn of the separate license? Someone must have completely neglected to study our past experience. Nemo 11:04, 11 February 2019 (UTC)
Clearly not. It appears to have been rolled out across the project on a policy of seeing how many things it would break, rather than thinking through how best to test it, before having an impact on 51 million web pages.
How lovely it would be if questions like "what", "why", "when" were asked and supported by a community process, before system wide significant changes were made. Somehow the fact that it is far more expensive to resolve these questions downstream rather than in advance of making changes keeps on getting missed, despite the experience of the last 70 years of software development. -- (talk) 11:39, 11 February 2019 (UTC)
I respectfully disagree that the method the Wikimedia Foundation used was faulty, new features should be released without Community consent because "the community" (those in power or those who benefit from the status quo) will then do anything within their power to oppose any changes. Imagine if the blocking tools were up for debate, then globally locking accounts would've still been the only option instead of local blocks and the fact that the most prolific photographer on Wikimedia Commons is a user blocked on the English-language Wikipedia for threatening Filipino (Pinoy) editors with the wrath of his Latin American dwarfs and curses while here he is a well-behaved and thoughtful individual whose donations grace thousands upon thousands of Wikimedia pages show that a block on one Wikimedia website shouldn't automatically be imported to another. Or if the MediaWiki Upload Wizard wasn't created then most users would've been forced to use the old Upload Form which is very unfriendly to novice users and require you to have an intimate knowledge of copyright © tags beforehand. It’s also good that sysops aren’t interface sysops anymore because that could've caused some serious damage. I’m not saying that the Wikimedia Foundation are perfect, they’re not and several things like unsolicited global bans actively harm the project especially when they affect high profile importers like Russavia, but for this technical addition first requesting community consensus would halt progress, it’s better to fix an imperfect system than to disregard the system without giving it a chance, this is a community in which second chances don't exist and while positive contributions are forgotten the very next day something negative will never he forgotten, no matter how minor. I do agree that opting out should be an option as not all users use the MediaWiki Upload Wizard, but for users who just started editing Wikimedia Commons file captions is all they know and while “an old generation” of editors will slowly be replaced by a new one, all they know would be that data on Wikimedia Commons is structured. Wikidata also faced a lot of (especially local) resistance but as time progresses less and less of that remains, this has been a positive development. I don’t say that we should give the Wikimedia Foundation carte blanche, but we should allow them to learn rather than just shoot down every idea. The file captions are probably the least structured of all planned Structured Data on Wikimedia Commons and any negative feedback we give them now on these is something they can take into the development of the rest of the features, they would have to begin somewhere and starting with file captions is what they chose. I just hope that they won't break the MediaWiki Upload Wizard (for mobile) with the coming features, but we should at least give them a chance. A lot of user feedback has already been implemented, let’s just hope that they’ll keep doing that. We don't just edit Wikimedia Commons for ourselves, we edit it for the re-users and the Structured Data on Wikimedia Commons package will best enhance their experience. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 13:58, 11 February 2019 (UTC)
Yeah, but it was stupid...
I do have to agree with the fact that the launch of file captions was sloppily done, the Wikimedia Foundation should've never assumed that file captions could not be copyrighted and should have just added the Creative Commons 0 (Zero) license there since day one, but at least now they won't repeat this mistake with any other Structured Data on Wikimedia Commons feature. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:24, 11 February 2019 (UTC)

Wikidata RfC related to the inclusion of Wikimedia Commons categories[edit]

Everyone is invited to give their opinion whether or not Wikidata should allow for the inclusion of items with only a Wikimedia Commons category, this request for comment was started to help the Structured Data on Wikimedia Commons programme and could be found on Wikidata at "Wikidata:Wikidata:Requests for comment/Allow for Wikidata items to be created that only link to a single Wikimedia Commons category (Wikidata notability discussion)". Note that this request for comment is on Wikidata so following that link will take you off Wikimedia Commons (as if this website is a wikidrug Face-smile.svg). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:23, 12 February 2019 (UTC)

Retroactive copyright? Or is the Elephant still towering over us?[edit]

Today we finally got something we should've gotten since day 1 (one), see the above attachments, but from what I can tell this doesn't retroactively affect older file captions, sure WMF Legal assured us that they suspect that these fall under the "COM:TOO" but a fair amount of editors have expressed their doubts and requested clarification. Now here's the issue, older file captions aren't affected by this notice (Mobile 📱) as Jeff G. specifically said "Any caption I in my individual capacity have created on Wikimedia Commons falls (irrevocably) under the site's Creative Commons Attribution-ShareAlike License, version 3.0. —Jeff G. please ping or talk to me 14:31, 12 February 2019 (UTC)" however his contributions aren't alone in this, all other file captions (except for mine) still potentially fall under this. In order to not repeat this mistake, could the template please be immediately updated to reflect any other subdivision of Structured Data on Wikimedia Commons as they roll out? The moment we get file depicts this template should be updated to reflect their licenses, we tend to be very careful when it concerns copyright © issues so please let's plan ahead before the next feature launch. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:05, 13 February 2019 (UTC)

There is no good solution for all prior captions. They should be deleted as a precaution. Commons is not here to systematically commit copyfraud. -- (talk) 11:16, 13 February 2019 (UTC)
@: deletion is a very heavy-handed step, I think that a notice on some files that their captions might be Creative Commons Attribution-ShareAlike License, version 3.0 is probably sufficient. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:21, 13 February 2019 (UTC)
It's not realistic to believe that in a year, 5 years or 20 years time, someone will not be mass processing and copying all captions elsewhere into other products, including reselling the data, on the presumption that all the text is CC0 and no attribution is needed.
Simply sticking a weird template on the image page, will not even be visible to the normal API query. -- (talk) 11:34, 13 February 2019 (UTC)
Basically what Fae said. CC0 is not compatible with CCBYSA 3.0, and the only way to attack that is by invoking the threshold of originality. But TOO depends on creativity and not simplicity, where simplicity is only a proximate measure of creativity. TOO also varies wildly across jurisdictions and... basically I don't see any way of resolving the issue outside of replacing them entirely or winding up in some version of a giant cluster fuck. Well, that or just admitting that "we honestly don't really give a crap about whether they're properly licensed or not, we'll do what we wan't regardless and give us a takedown notice or shut up about it". That's the easiest solution, but is wholly antithetical to the entire rest of the project. GMGtalk 13:28, 13 February 2019 (UTC)
sure WMF Legal assured us that they suspect that these fall under the "COM:TOO"
@Donald Trung: where did legal say that. Afaik that was just some gross misinterpretation by the devs who don't understand legal. These older captions have to be either deleted (too bad), manually checked for TOO (who's gonna do that? not me I tell you) or the authors need to be asked for permission, which given that many were probably copypasted.. no. - Alexis Jazz ping plz 20:42, 14 February 2019 (UTC)
@Alexis Jazz:, you are right, the legal department of the Wikimedia Foundation didn't say that, the developers on this page stated that legal told them something and they just went with it, but as Jheald said above there has been a Phabricator ticket requesting a clarification of this since 2017 which got largely ignored. The worst part is that this has been addressed years before file captions were launched. I still think that a mass-message and a page for opting in for the Creative Commons 0 (Zero) license, this is better than deleting. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:07, 14 February 2019 (UTC)
I just noticed above that Slaporte (WMF) actually works with the devs. But his reply makes me worry a bit.. Either Stephen doesn't understand the legal ramifications of copying many descriptions as opposed to just one or he has never read any actual description/caption. I don't know which would worry me more. - Alexis Jazz ping plz 22:41, 14 February 2019 (UTC)
@Alexis Jazz: your example above about taking sentences from Harry Potter books covers a creative work and has nothing to do with this. We are discussing short, non-creative works that factual describe something. Short, non-creative, factual statements have never been protected by copyright, and those kinds of descriptions are what should be discussed for captions. When a description is a short, factual, non-creative statement, it's eligible for CC0 as such a thing was ineligible for CCBY in the first place. Conceptually, this is no different than Template:PD-ineligible. If a caption is potentially creative, it can be deleted or re-written. Keegan (WMF) (talk) 22:07, 15 February 2019 (UTC)
"The rough practical test [is] that what is worth copying is prima facie worth protecting" (Peterson J, 1916) [10] "The word original does not in this connection mean that the work must be the expression of original or inventive thought... the Act does not require that the expression must be in an original or novel form, but that the work must not be copied from another work" (ibid.) This was in the wake of Walter v Lane (1900), which held that shorthand writers were entitled to prevent their record of a speech from being copied, even though it was drawn directly from what the speaker had said. Compare also the Australian case Telstra Corporation Ltd v Desktop Marketing Systems Pty Ltd (2001) which found that Telstra was entitled to copyright in the contents of a telephone directory -- you don't get a lot more factual or non-creative than that. Both English and Australian law have drawn back a bit from this in the last ten years, but to say "short, non-creative, factual statements have never been protected by copyright" is simply not true. The all-protecting approach of Peterson and Walter v Lane was dominant in most of the English-speaking world for the whole of the 20th century. Even today, the German Copyright Arbitration Board has suggested that takings of just seven words should be enough to trigger Germany's ancillary copyright for press publishers [11] -- even with "individual words and smallest text excerpts" specifically excluded. When Fae says below that "descriptions, captions and even titles are creative works", that is exactly correct. Use of titles is generally fair, because of the need to be able to refer to works. But the same can't be said of wholesale taking of descriptions and captions.
I put up 4 questions for WMF Legal in the wake of User:Slaporte (WMF)'s contribution above. [12]. Unless and until we have a satisfactory response on those questions, I don't think we will have guidance we can treat as either sufficiently considered, or reliable. Jheald (talk) 23:26, 15 February 2019 (UTC)
Perhaps also worth noting that the lead EU case on this at the moment is Infopaq (2009) which found that "storing an extract of a protected work comprising 11 words and printing out that extract, is such as to come within the concept of reproduction in part within the meaning of Article 2 of Directive 2001/29 ..., if the elements thus reproduced are the expression of the intellectual creation of their author; it is for the national court to make this determination." Jheald (talk) 00:17, 16 February 2019 (UTC)
Descriptions, captions and even titles are creative works. Perhaps you could encourage WMF legal to be definitive? If only pure uncopyrightable data were allowed in captions your personal views would carry more weight, but that is not what the WMF rolled out. -- (talk) 22:17, 15 February 2019 (UTC)
Hi . "Perhaps you could encourage WMF legal to be definitive? If only pure uncopyrightable data were allowed in captions your personal views would carry more weight". This has occurred. I understand the statement from WMF Legal is contrary to your position, but this is what it is. It's not my personal view, as you've said previously. Keegan (WMF) (talk) 22:36, 15 February 2019 (UTC)
@Keegan (WMF): I would really, really want to see Slaporte (WMF) elaborate on this, because I think he's sorely mistaken here. Could you release some simplistic data, even in large amounts, as w:Template:PD-USonly? Perhaps. I look forward to your proposal to start accepting PD-USonly here, because it's not even a valid license for content on Commons. But could you release it as CC0? No, absolutely not. Because you don't own the rights. COM:TOO in the UK for example is far lower, some countries have database rights. PD-USonly doesn't equal CC0. You should know that. And file descriptions aren't always "short, non-creative, factual statements". I laid out exactly what you should do on Commons talk:File captions#Best practices?. File captions may be re-introduced at some later stage, after some thorough discussion. Shoving broken features down our throat does not create goodwill. Working with us to create features that are useful will make everyone happy. My comment on Commons:Village pump#Proposal to remove captions feature holds true: "I don't think every new feature requires having a vote, but I do think the captions feature should be disabled for now. At the very least, there should be an option (without hacks) to disable it. The issue with huge boxes appearing needs to be fixed/made collapsible. phab:T213571 needs to be fixed. A clear plan on how this works together with descriptions needs to be presented. Realistic use cases need to be presented."
Your comment on Commons talk:File captions puzzles me: "Captions are but a single feature of the entire suite, and one that had to be introduced first for a plethora of reasons". No. They didn't. Captions didn't have to be the first feature, shouldn't have been the first feature, and in the current form, should never even be a feature. Don't just plow on from here. The community goodwill you have has been severely damaged. - Alexis Jazz ping plz 23:27, 15 February 2019 (UTC)

SQL to find captions[edit]

I'm trying to find a quick way to identify/report files in specific collections with captions. I raised the slow Pywikibot API method I have here, and suspect that a specific line of SQL could be massively faster, but after checking out the public database I have no idea where captions are supposed to be. Am I missing something, or have the database changes not been replicated yet to commons_p? -- (talk) 15:52, 13 February 2019 (UTC)

I was trying to do the same yesterday (fetching captions in SQL), and ended up in the same place − could not find anything in the database tables. Tentatively poking @Addshore:. Jean-Fred (talk) 09:53, 14 February 2019 (UTC)
Captions (and all other content, like statements) are content, not metadata, so you can't pull it from MySQL (as we don't store it there, but in ExternalStore, like with wikitext). Once there's a WDQS instance for Commons you could use that, but that's not available yet, sorry. See T141602. Jdforrester (WMF) (talk) 19:15, 14 February 2019 (UTC)
Wouldn’t it be possible to (additionally) save captions in the wb_terms table, just as it is done with labels, descriptions, and aliases at Wikidata? From my experience with WDQS, large-scale string operations such as regex searches are much more efficient in SQL than in SPARQL. —MisterSynergy (talk) 20:04, 14 February 2019 (UTC)
@MisterSynergy: The wb_terms table is deprecated and not written to in modern installations because they melt the database servers. Wikidata.org needs to migrate off the table soon, but is held back by lots of community-written hacks relying on it instead of using the scalable SPARQL servers, sadly. Jdforrester (WMF) (talk) 20:22, 14 February 2019 (UTC)
Oh really? At mw:Wikibase/Schema, it is not marked as deprecated. WDQS has that 1 minute timeout which makes it pretty useless for lots of string operations. At wikidata:Wikidata:Request a query we regularly have requests related to terms which do not work with SPARQL, but they do with SQL (using the Quarry tool or the toolforge console). —MisterSynergy (talk) 20:34, 14 February 2019 (UTC)
@MisterSynergy: Didn't know about that page, now fixed. Jdforrester (WMF) (talk) 21:03, 14 February 2019 (UTC)
@Jdforrester (WMF): Thanks for the answer. If the content is not in the tables, are the 'log actions' about captions somewhere? I imagine they have to be in order to populate history tab and contributions. For the use case I have I care less about the content: it’s more “Get a list of files which had a caption edited between date X and Y”, or “Get a list of all users who edited at least X captions”.
Thanks, Jean-Fred (talk) 23:10, 14 February 2019 (UTC)
@Jean-Frédéric: It's very messy, I'm afraid – Wikibase is designed to be fast to render and edit, and search is meant to take place on the SPARQL side. This query:
select rev_id, rev_text_id, rev_user, rev_user_text, comment_text from (SELECT * FROM revision LEFT JOIN revision_comment_temp ON rev_id = revcomment_rev WHERE rev_page=174) AS foo LEFT JOIN comment ON comment_id=revcomment_comment_id WHERE comment_text LIKE "/* wbsetlabel-%" ORDER BY rev_id DESC
… gets you every revision that adjusted a caption (and with a code-side regex, what language, and whether it was an add/alter/remove), but it's (a) slow, (b) doesn't scale to multiple files without slowing down the database a bunch, and (c) could be gamed by users manually writing random edit summaries starting "/* wbsetlabel-" just to be difficult. That probably doesn't help. Jdforrester (WMF) (talk) 23:55, 14 February 2019 (UTC)

Relevance of captions for internal / external search?[edit]

Wondering how the content of captions will influence the ranking when searching for images. Either through WP internal search or through third party external search (e.g. google). Can we use / misuse captions for SEO? Is there a policy about? best --Herzi Pinki (talk) 00:46, 15 February 2019 (UTC)

Captions and Hot Cat[edit]

I have encounter possible problem. Can someone reproduce it?

  1. Go to preferences and switch on Hot Cat
  2. Go to whatever file
  3. Add caption in two languages and save it
  4. Than Add more categories via Hot Cat tool and save it

Youll be warned that you are editing old revision, but this I think is not the truth, just kind of wrong recognition of Hot Cat. Juandev (talk) 20:07, 15 February 2019 (UTC)

Same message if editing just one category. Juandev (talk) 20:31, 15 February 2019 (UTC)

This has already been reported. Christian Ferrer (talk) 23:40, 15 February 2019 (UTC)