Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Multilingual captions beta testing[edit]

The Structured Data on Commons team has begun beta testing of the first feature, multilingual file captions, and all community members are invited to test it out. Captions is based on designs discussed with the community[1][2] and the team is looking forward to hearing about testing. If all goes well during testing, captions will be turned on for Commons around the second week of January, 2019.

Multilingual captions are plain text fields that provide brief, easily translatable details of a file in a way that is easy to create, edit, and curate. Captions are added during the upload process using the UploadWizard, or they can be added directly on any file page on Commons. Adding captions in multiple languages is a simple process that requires only a few steps.

The details:

  • There is a help page available on how to use multilingual file captions.
  • Testing will take place on Beta Commons. If you don’t yet have an account set up there, you’ll need one.
  • Beta Commons is a testbed, and not configured exactly like the real Commons site, so expect to see some discrepancies with user interface (UI) elements like search.
  • Structured Data introduces the potential for many important page changes to happen at once, which could flood the recent changes list. Because of this, Enhanced Recent Changes is enabled as it currently is at Commons, but with some UI changes.
  • Feedback and commentary on the file caption functionality are welcome and encouraged on the discussion page for this post.
  • Some testing has already taken place and the team are aware of some issues. A list of known issues can be seen below.
  • If you discover a bug/issue that is not covered in the known issues, please file a ticket on Phabricator and tag it with the “Multimedia” tag. Use this link to file a new task already tagged with "Multimedia" and "SDC General".

Known issues:

Thanks!

-- Keegan (WMF) (talk), for the Structured Data on Commons Team 20:40, 17 December 2018 (UTC)

Notability discussion on Wikidata[edit]

People following the structured data project may be interested in this discussion that has kicked off on Wikidata:

d:Wikidata:Project_chat#Creating_new_items_for_humans_based_on_Commons_categories

A bot job started by User:Mike Peel has been stopped, that was creating Wikidata items for people with Commons categories that cannot be matched to Wikidata, on the grounds that such people may not necessarily be notable.

SDC of course would require such items to exist, for it to be possible to make statements about them. Items for every person or thing that currently has a Commons category would seem a bare minimum -- some visions for SDC envisage going much further, for example creating an individual Wikidata item for every single separate museum object that we currently have an image of.

Whatever the outcome, this is something we desperately need more clarity on, looking forwards; not least to plan around, in the event that such items on Wikidata would not exist. Do any of the SDC team have any thoughts, eg @SandraF (WMF): ? Jheald (talk) 12:49, 7 January 2019 (UTC)

Jheald thanks for the link. --Jarekt (talk) 15:04, 7 January 2019 (UTC)
The decisionmaking around this topic is fully up to the community... As a staff member, I want to make a point of not wanting to impose an opinion on this at all. With my volunteer hat on, I have no strong opinions either. If we create Wikidata items for everything, we must be able to properly maintain that huge mass of items too... I think less notable heritage objects can be modeled purely based on more generic statements (represents vase, with features blue paint / flowers/fishes... / designed by x / with inventory number nnn / in collection y) on Commons, and we can also decide to model less notable people in a similar, more generic way there. But I will happily follow the broader community's wishes if there is consensus about creating Wikidata items for everything. SandraF (WMF) (talk) 17:37, 7 January 2019 (UTC)
@SandraF (WMF): I think less notable heritage objects can be modeled purely based on more generic statements [i.e. without their own items], and we can also decide to model less notable people in a similar, more generic way
If you do believe this, I would like to see a fully worked-up example, to establish (i) how information about the underlying object, and its nature, creator, copyright status, licensing, history etc, would be kept distinct from information about its depiction/photograph; (ii) how this is possible when it is not possible to have qualifiers on qualifiers -- something the current Commons:Structured_data/Properties_table shows up as a major unresolved difficulty; (iii) how this would play alongside images where description in terms of wikidata items would be possible -- how great would the difficulties be that we would get into, if we would be trying to operate two quite different data models at the same time?
Rather than just you saying that you think this can be done, if having to go down this road is even slightly conceivable as an outcome, I would like to see some hard modelling to show how it definitely can be done; and what the consequences would be. Because to date I'm not sure that the data designs so far presented would cut it. Jheald (talk) 18:23, 7 January 2019 (UTC)
Yup. Whether one or the other solution is satisfactory is up to the community to reach consensus about! Deployment is around the corner, so the community can try this quite soon. Seeing the technology in front of one's eyes will certainly clarify things and cause more people to have strong opinions about this. SandraF (WMF) (talk) 09:03, 8 January 2019 (UTC)

Multilingual file captions coming this week[edit]

Hi all, following up on last month's announcement...

Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:

  1. Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
  2. Test out using captions on Beta Commons.
  3. Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.

Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.

Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. I'll add a reminder to this post once I confirm exactly what day captions will be turned on for Commons. Keegan (WMF) (talk) 01:06, 8 January 2019 (UTC)

  • Apart from the (cumbersome and totally useless) language selection drop-down, which seems to have been already in use elsewhere, I cannot see anything new. So, descriptions can and should be added to Commons files — how’s that any different than previous practice? -- Tuválkin 14:54, 8 January 2019 (UTC)
  • Could you please explain more on how you find using the translation feature cumbersome and useless? Do you find it easier or more difficult to add a language to a description template? You are certainly welcome to not use captions if you do not find the feature useful for your work, but if there's a way that it can be improved we'd like to hear about it. Additionally, if you could provide a link to this tool that seems to already exist elsewhere here, I'd appreciate it because I haven't seen it and I'd like to take a look. Keegan (WMF) (talk) 18:16, 8 January 2019 (UTC)
  • what I said is cumbersome and totally useless is the language selection drop-down, which seems to be the same exact element that shows up as a generic language-selection tool when one uses an WMF project while not logged in. I think it is cumbersome because it is made up mostly of empty space and the way languages are sorted (geographically, and ignoring the browser’s options on prefered languages?) makes it hard to find a language, not to mention the unintuitive way with scrolls and gains selection focus. I guess that you coopted this pre-existing element (which is of course good practice), but it is an essential part of the whole caprions feature. For me, the ideal language selector is a single, easily scrollable list of languages, properly sorted (the collation of which would be interesting to discuss, in terms of internationalized user expectations), whence to pick one out (one or several — Ctrl-click does work on some devices). That much for cumbersome. It is also useless because when a user is logged in there’s no need to present a complete list of languages. Even the most formidable polyglot will have to pick from a dozen or two; only in the unlikely situation one would be contributing in a language one’s not versed on such a general selection too wouled be needed — and for that a "more languages" button seem better than what we have now.
  • I certainly do find it simplest to click to edit the file page’s wikicode and add {{ab|Something here.}} (or {{ab|1= Something here.}}, if I’m feeling chatty) next to where it says |Description = }} — way simpler than going through UI hoops, but I understand that’s not what you’re after, especially since what I find simplest is already tried and tested and working for many years. But even if wikitext needs to be not offered, there’s Visual Editor apparently working in many projects; adding captions to be injected in a page seems to be the most basic of its functions.
What I asked is what this new feature amounts to. We’ll have a pencil icon that brings up an already existing language-selection tool and thence we procede to a rich text entering box/screen whose working are new either? Is it the pencil icon that is new?, to be shown in file pages and interspesed in the upload wizzard? -- Tuválkin 22:36, 8 January 2019 (UTC)
From a technical standpoint, captions are like labels on Wikidata. They will be searchable through the API, making it easy to find/filter/pull captions from files as metadata. There are a lot of possibilities of what this can be used for, from filling in infoboxes, building lists for translation of important files needing a caption localized for a project or campaign, searching for and finding captions, etc. So in comparison to description templates, while they potentially contain similar data to a caption, their function and reuse purposes are very different. Keegan (WMF) (talk) 19:57, 9 January 2019 (UTC)
  • @Keegan (WMF): Understood, thanks. Maybe this field can/could be populated with the contents of the |Description = field of {{Information}}, {{Artwork}}, and other such templates. -- Tuválkin 01:43, 10 January 2019 (UTC)
  • If I understood it correctly, Tuvalkin meant an automatic action. The ideas behind structured data, semantic web and LOD are great, changes are good, but when I think of few thousands of my files on commons I would really love some bot. Especially, that descriptions are well structures in templates and mostly in size of captions. Nova (talk) 20:57, 10 January 2019 (UTC)
  • I think a real bot would create to much trash, but a Tool like VisualFileChange would be great. --GPSLeo (talk) 21:03, 10 January 2019 (UTC)

Captions are live[edit]

Captions can now be added to files on Commons. There's a bug with abusefilter sending errors to new accounts adding captions, the bug is being investigated and fixed right now. IRC office hours will be in a little over one hour from now, I look forward to seeing you there if you can attend. Keegan (WMF) (talk) 16:50, 10 January 2019 (UTC)

  • Is there a way to disable the box, or make it much less invasive? It's very annoying that it pushes the actual captions some half page down. Nemo 18:13, 10 January 2019 (UTC)
    • @Nemo bis: Make this edit to your user css and it'll disable the captions. If @Keegan (WMF): or someone could add an ID to the css surround for it then we could attach some extra css tags to it to show/hide it, which would be better. Thanks. Mike Peel (talk) 19:36, 10 January 2019 (UTC)
      • @Mike Peel: I'll make a Phabricator ticket later today to look into that. Keegan (WMF) (talk) 20:00, 10 January 2019 (UTC)
      • Thanks for the css tip, looking forward the show/hide option. Nova (talk) 19:56, 10 January 2019 (UTC)
        • Update: try this to also disable the "structured data" header. Thanks. Mike Peel (talk) 20:28, 10 January 2019 (UTC)
          • Works better now, thanks. Nova (talk) 20:41, 10 January 2019 (UTC)
    • It's easy indeed to hide the entire thing, but I'd just like it to still be there somewhere and not take one third of my screen or so. I suspect someone assumed that people don't care about existing descriptions being pushed out of the screen, or that nobody speaks more than 2 languages. Nemo 16:07, 11 January 2019 (UTC)
  • Hi, @Keegan (WMF): I see that a file ID, called "entity" is added as the first time a caption is created. Pictogram-voting-question.svg Question is the IDs created only when we add a "structured data" for the first time, or will IDs will be created automatically for each existing files? Christian Ferrer (talk) 18:44, 10 January 2019 (UTC)
    @Christian Ferrer: Hey, good spot. This is something we fixed in development yesterday, and will not be displayed from next week (the other issue is that it says "label" rather than "caption", which will also be fixed in the same change). Jdforrester (WMF) (talk) 18:49, 10 January 2019 (UTC)
    There is also some other things that I noticed. There is not anymore rollback, but if I remember well we already talked about that heu.. no, the rollback works well.... The second thing is : if you create a caption, then a ID is created, ok. If you revert then the ID is also removed. It's confirmed by the exact same number of bytes added and removed to the file.
    Now if you delete all the captions without having reverted, then the captions are indeed removed, but the ID stay. It's also confirmed by the number of bytes. This is not really a question, just a thing that I noticed. Regards, Christian Ferrer (talk) 19:05, 10 January 2019 (UTC)
    @Christian Ferrer: Yeah, it's technically the marker for the entity ID. It's not relevant to users (they can't use it and they can't change it; it's just the reference for the database), so we won't be showing it (if you really need to know it, it appears on the action=info page). The "byte size" change is also not very helpful or accurate as that's a measure for the database which depends on the JSON serialisation of the entity model, but removing that from history pages would probably be disruptive for power users so we don't plan to do that right now. Jdforrester (WMF) (talk) 19:18, 10 January 2019 (UTC)
  • Nice feature. It would probably be useful to write local file caption guidelines within Commons, in order to tell people which style is expected in the caption. mw:Help:File captions is a rather technical manual (no markup, how to undo, and so on), but things like capitalization, punctuation, preferable caption length and so on should probably also be advised to users… —MisterSynergy (talk) 19:44, 10 January 2019 (UTC)
  • @Jdforrester (WMF): Note that the "Captions" features are available in the file redirect pages. What will be the impact if captions are added there? Christian Ferrer (talk) 21:04, 10 January 2019 (UTC)
    @Christian Ferrer: Another good spot. Captions on redirect pages aren't useful, and we will disable them, but it won't break anything. We've filed a Phabricator task to do this. Jdforrester (WMF) (talk) 21:07, 10 January 2019 (UTC)
  • Just whoa. You guys let this thing go live like this? With a layout that will immediately antagonize the exact kind of contributers who would be the most enthusiastic and productive about captions? A layout that hoggs whitespace (does it even look tighter in monobook, respecting its default margins and paddings?), a layout that puts this thing above all else on the page (above "Summary", srslsy?), under an H1 heading (whiskey tango foxtrot, aren’t you guys all about structure?!), with some wierd horizontally divided box which will mistify both oldschool HTML 1.0 veterans and swipe-swipe whipperspnappers (click on the pencil to edit the caption text under it across a line?; why not clicking the caption itself?, or put a proper button next to it!)…? After you’ve been working on this captions things since May last year, at least? Good grief, you’re supposed to be the code gods that are going to dig a ditch between yourselves and the computer illiterate masses, burying all the power users in it. Turns out part of that dire prediction isn’t true after all, but sadly that’s the part about code gods — for this gizmo seems utterly ungodly. And therefore I’m gonna sprinkle some CSS holy water on my ”skin” and forget I ever saw this thing live in production looking like this. -- Tuválkin 02:09, 11 January 2019 (UTC)
  • And nobody thought of turning it off for file redirects oscar mike golf. -- Tuválkin 02:12, 11 January 2019 (UTC)
  • This is really bad. Please turn it live only when at least Template:Artwork is correctly handled, either by using the wikidata element or the description field with language template. The feeling now is that all the hard work that was done to describe files is going to be lost. Léna (talk) 13:08, 11 January 2019 (UTC)
  • Agree with some of the comments above. I thought the team had accepted and undertaken that structured data needed to be on a different tab to the regular file information, after this was flagged by multiple respondents in the Statements consultation (September-October 2018)
As a result, in the "What's new" section of the "Statements 2" consultation (November 2018), User:Keegan (WMF) wrote:
The tabs for Wikitext content and metadata (respectively called 'File information' and 'Structured data' for the purposes of this discussion) are now true tabs instead of anchor links, which should reduce/eliminate the occurrence of super long pages.
Such tabbing is necessary, and should be implemented ASAP. The Structured Data is (or, we hope, will be) very important for machines. But it is important it should not get in the way of the templated information for humans. Jheald (talk) 14:27, 11 January 2019 (UTC)
Jheald's last argument hits home. Nova (talk) 16:51, 11 January 2019 (UTC)
In the statements consultation I was referring to the decision to put statements behind tabs. Captions were never planned to be hidden from users, but most of the rest of SDC will be behind a tab. I think the planned new box that gathers "use this file" and attribution generation is probably going in the whitespace that already exists and is unused to the right of files (as seen in the statements mockups), but as far as I know for now that's the only other visible thing. Keegan (WMF) (talk) 18:06, 11 January 2019 (UTC)
I do see the problem in how the mockups are presented, though, by not showing the "File information" tab first. A side-by-side comparison that would have shown captions on the "main" file page, instead of them simply being absent from the statements mockup. I'll make sure to not repeat that mistake in the next feature design consultations. Keegan (WMF) (talk) 18:25, 11 January 2019 (UTC)

Accounts on Beta Commons[edit]

Trying to create an account on Beta Commons: is it possible that the error message "The passwords you entered do not match" arises when the actual problem is something else? I really doubt that four times in a row I couldn't match my password correctly, but four times in a row I got this same error. - Jmabel ! talk 09:59, 9 January 2019 (UTC)

@Jmabel: It's most likely that you're trying to use your SUL account. The Beta Cluster does not operate at the high levels of security that we have for production, hence the message above the login form:
This site (Beta Commons) allows WMF staff and community volunteers to test MediaWiki in a production like environment.
Do NOT use your normal password, or any password you use anywhere else online.
Did you already have a Beta Cluster account? If not, did you create one?
Jdforrester (WMF) (talk) 15:38, 9 January 2019 (UTC)
Again, as I wrote above, I was trying to create an account. It asks me to enter an account name, an email address, and enter a password twice. I got this error message after doing so, four times.
Is it a problem that I used the same name as my account here? It shouldn't know whether the accounts have the same name. - Jmabel ! talk 18:24, 9 January 2019 (UTC)
@Jmabel: Oh, sorry. No, it's not going to know about you having an account on this (other) system. I just created a test account there and it worked fine. Jdforrester (WMF) (talk) 18:43, 9 January 2019 (UTC)
Jmabel, it used to be the case that MediaWiki on some non-production servers (like wikitechwiki) was not able to deal with passwords containing certain Unicode characters. Have you tried using an ASCII password, as silly as this might sound? Nemo 19:15, 10 January 2019 (UTC)
Well, at this point the feature is live, so there's no point to my looking at the Beta. - Jmabel ! talk 00:43, 11 January 2019 (UTC)

Accessing the captions via lua and pywikibot[edit]

@Keegan (WMF): (and others): are there ways to access the captions using Lua and pywikibot, or are they human-accessible only at the moment? Thanks. Mike Peel (talk) 16:45, 11 January 2019 (UTC)

Humans-only for now, or read-only via API. This will change, I do not know when at the moment. Keegan (WMF) (talk) 18:43, 11 January 2019 (UTC)
OK, thank you. If you can let me know when it is available then I can see if it can be integrated into the wikidata infobox to supplement the captions from Wikidata (and/or sync those over to here). Thanks. Mike Peel (talk) 19:54, 11 January 2019 (UTC)

Copyright status of structured "items"[edit]

Are "Captions" and other SDC "items" released under the CC BY-SA 3.0 like the rest of Wikimedia Commons or under the CC0 license like Wikidata? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 23:42, 11 January 2019 (UTC)

My guess is that free text captions are CC BY-SA 3.0, like the rest of the page. However most of the properties will be de-facto ineligible for copyright so for all practical reasons CC0. --Jarekt (talk) 03:43, 12 January 2019 (UTC)
There's a phabricator ticket that's been asking this question since 2017, but with no meaningful input yet.

With the captions that are live right now, it's something that needs to be clarified urgently. @Keegan (WMF): ?
Since the whole page is licensed CC BY-SA 3.0, and there is nothing to indicate anything different for the captions, I think that means that any caption being added by a user at the moment has to be considered to be CC BY-SA 3.0. I think the contributor would be entitled to assume that that is the license under which they have made the caption string available, given that there is nothing else anywhere indicating anything other than this. This needs to be addressed quickly if the ultimate intention is to release the data CC0, because otherwise there will a considerable set of CC BY-SA 3.0 captions building up, that would have to be cleared to the more permissive release.
A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages. Even if reporting that the creator of a painting was Leonardo da Vinci may be an uncopyrightable fact, extracting such information at scale may fall subject to database rights that might be only available BY-SA. Other information, eg saying that a painting was "probably painted c.1530" or was considered to be by "a follower of Raphael", may reflect real intellectual choices that could attract copyright in their own right, particularly if a substantial number were taken. This is a question that needs clarification, before substantial data transfer starts from Commons templates. Jheald (talk) 11:38, 12 January 2019 (UTC)
Jheald, I disagree with legal theory that metadata, listing basic facts about someone or something can produce its own copyrights. You are not making artistic choices here and merely reporting information you found in the references. Otherwise you are doing something wrong. That is why it is OK to copy such information from Wikipedias or Commons to Wikidata. However, I agree that this should be clarified sooner than later. My vote would be to store most or all of Structured data under CC0 license so it is compatible with Wikidata. @Keegan (WMF):, I think this is important, especially since Commons is a project which is obsessed with getting copyrights right. We might need to discuss it as a project, but I also think WMF lawyers should look into it as well, especially if we start reusing the data and combining it with wikidata. --Jarekt (talk) 04:01, 13 January 2019 (UTC)
It would probably be wise to simply add the text "by publishing this you agree that you release this caption with the CC0 license" but it might confuse people into thinking that this applies to all texts or something. Maybe one of the developers should open a proposal at "Commons:Village pump/Proposals" and ask for community feedback in how to clarify this without being "too intrusive". But a simple indication that all Structured Data on Wikimedia Commons "Items" are CC0 would suffice in the beginning of the process. Let's not forget that this is (legally) important for the re-users outside of Wikimedia websites as they're the people this whole system is built for. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:52, 13 January 2019 (UTC)
I'm seeing where we are on all this. Keegan (WMF) (talk) 19:56, 14 January 2019 (UTC)

┌─────────────────────────────────┘
@Donald Trung, Jheald, Jarekt: We are planning to ask users to release items on Structured Data on Commons under CC0, and we are working with the legal team to add the right license-related language. Full text pages on Wikimedia Commons may still be under CC BY-SA.

“A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages.”

The text on Commons is CC BY-SA currently, to the extent it's protected by copyright. However, most captions are not likely to be copyrightable, since they are short and factual. Copyright protects creative works of expression, and not the underlying ideas. It's possible that there will be some descriptions that are so idiosyncratic to get Copyright protection, and Commons has a couple of options if that comes up: 1) argue that they are not copyrightable, 2) just remove those captions and write a better, CC0 caption. Either choice would be guided by how the Commons community would like to settle it on Commons:File captions. Keegan (WMF) (talk) 22:08, 15 January 2019 (UTC)

@Keegan (WMF): "Short and factual" doesn't get you a copyright waiver. To be without copyright there has to be essentially no choice in the text that was written. That's simply not true for a caption, and certainly not true for 40,000,000 of them. Jheald (talk) 23:06, 15 January 2019 (UTC)
Hullo @Jheald: the WMF legal team says that Copyright law protects creative expression, and not ideas or concepts. Short phrases that can only be written in a limited number of ways are not protected. The caption field has a few limitations: it can only have 255 characters, it does not allow Wikitext, and it should factually describe the image. The vast majority of captions will not be a sufficiently creative work of authorship to be copyrightable. A few references from the team: Stanford Law School published a guide on how this question has been handled by U.S. Courts, and U.S. Copyright Office Circular 33 explains what level of creativity is required for protection. In cases where a description is so idiosyncratic as to require compliance with CC BY-SA, it should be removed from a CC0 caption field. Copyright law can be unsatisfyingly unclear, so if you need more help clarifying what kind of creativity should be considered, please contact the legal team and they may be able to help by writing guidance in Wikilegal.
— Preceding unsigned comment added by Abittaker (WMF) (talk • contribs) 01:12, 16 January 2019 (UTC)
@Abittaker (WMF): This is Commons. You're not just dealing with U.S. copyright here, you're dealing with copyright for the whole world. And I dispute the claim that just because the caption is limited to 255 characters and needs to factually describe the image, that that means there are only a limited small number of ways it could be written. On the contrary, there may be any number of aspects of the image that the caption-writer may choose to foreground, and any number of ways to present them. The choice of one rather than any of the others is the writer's expression, and that is what copyright protects. It's no good trying to wish this away: there is an issue here which needs to be faced. Jheald (talk) 01:22, 16 January 2019 (UTC)
It may be interesting to note that the Indian Supreme Court recently affirmed that legal headnotes were protected by copyright. [3]. That's despite headnotes, on the face of it, being more formulaic and more derivative than image captions.
Similarly, this 2006 paper [4] suggests that (p.187) "other than the headnotes, private publishers probably do not have copyright in the court decisions they are publishing" (emphasis added).
In Canada the copyright status of headnotes was affirmed in the 2004 case CCH Canadian Ltd v Law Society of Upper Canada. Jheald (talk) 02:08, 16 January 2019 (UTC)

┌─────────────────────────────────┘
We do not have to figure out legality of copying captions at this point. The CC0 aspect of SDC should be advertised and copyright aspects taken into account before any bot migration of captions or other free text descriptions. However other SDC data should be OK as those are non-copyrightable facts. Also I do not agree that we need to consider laws of all jurisdictions when discussing copyrights of SDC. There are many jurisdictions and some have some odd laws (See here for example). However when disusing laws related to Commons text, than the only law we need to consider is US law as that is where the servers reside. --Jarekt (talk) 03:58, 16 January 2019 (UTC)

  • By the way, the text on the bottom of each Commons page "This text is available under the Creative Commons Attribution-ShareAlike Licence; additional terms may apply" should become something similar to d:Wikidata:Copyright: "All structured data from the [SDC] namespace is available under the Creative Commons CC0 License; text in the other namespaces is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. " --Jarekt (talk) 04:14, 16 January 2019 (UTC)

To add a category just after a caption[edit]

This morning each time I tried to add a category after to have added (and after saving) a caption I got this message : File:Screenshot Editing File Ophiozonella nivea (YPM IZ 007648 EC) jpg - Wikimedia Commons.png. Christian Ferrer (talk) 07:49, 12 January 2019 (UTC)

Hi, Yes, I already reported that: phab:T213462. Regards, Yann (talk) 08:49, 12 January 2019 (UTC)

Look and appearance of captions[edit]

Hi, it will be great the the file pages can keep a visual coherence.

1/ The title "Structured data" in a file page should be at the same sizes, and not bigger, than the other headers such as "Summary", Licensing", ect...
2/The size of the caption box on my screen (1920*1200) is a very little smaller than the {{Information}} and than the license template. It would be great that all boxes and templates be at the same size.

Regards, Christian Ferrer (talk) 12:59, 12 January 2019 (UTC)

@Christian Ferrer: Hey there,
The reason the MediaInfo section is under an H1 is because it's its own page component, at the same "level" as the wikitext block, which also has an H1 (the page's title). Right now the design is in flux, and I agree that it's a little confusing. In the future the design is going to change; the most recent design feedback session about this would mean that the H1 wouldn't appear, but instead the parts of the page would be split with tabs. That discussion is now closed, but I'd be interested to hear from you if that proposal would work for you.
You are right, the text in the Information template is 5% smaller than in the rest of the page – it's set to 95% (=13.3pt) of the general page content size (=14pt) in the template by using the class toccolours. I don't know why this was done, but it's been this was for a very long time, so I imagine a community discussion would be needed before changing the template.
Jdforrester (WMF) (talk) 21:26, 12 January 2019 (UTC)
I've no special strong opinion about potential tabs, but I have not really though to that for now. I just think that if headers there are, in a file page (or in a specific tab), then all main headers should be at the same level.
The size of the text was not my concern, I talked about the size (width) of the caption box compared to the width of {{Information}}, the width of caption box seems a little smaller
After a night's sleep I woke up with the certainty that you should limit the display at one langage at one time. Me I have 3 lines, and this is really boring (and some users have more...) although one line is fully accepteble. Furthermore I don't plan to write any caption in Spain langage + I don' want to hide this caption box, and now the result is that it comes to mind to remove es-2 from my babel just to avoid those 3 boring lines....When I looked to a file page when I was not connected, I found the 1 line box much much better...
Now that we have Commons:File captions, and in order to give infos to the visitors and editors, maybe that a link to that page should be given in the caption box, if not in the default display so then in the editing mode.

Christian Ferrer (talk) 05:26, 13 January 2019 (UTC)

How to search SD?[edit]

When and how will WP search support the search in SD? My naive approach using incaption:value was not successful. I noticed that the search takes the captions into account and finds them [5], but there should be a way to search specifically for captions.

BTW, searching for the help text "Add a one-line explanation of what this file represents" should not find any matches: [6] --Herzi Pinki (talk) 17:17, 12 January 2019 (UTC)

@Herzi Pinki: That's because you're using the wrong search engine; it's in this one. Jdforrester (WMF) (talk) 21:14, 12 January 2019 (UTC)

filed a bug report for the BTW. --Herzi Pinki (talk) 21:53, 12 January 2019 (UTC)

You did not get me. I did not want to search the code. MediaWiki Search should support searching captions like titles. best --Herzi Pinki (talk) 22:02, 12 January 2019 (UTC)

Search will be supported, it's not turned on yet. Keegan (WMF) (talk) 19:38, 14 January 2019 (UTC)

Wikitext[edit]

Just curious, but why can't descriptive information from Wikitext "Descriptions" and "Categories" be harvested to create structured data? Licenses could be harvested right? Then why can't a bot harvest vital user-generated information from both native descriptions and organization? It just seems like a major handicap that existing Wikitext on over 50.000.000 (fifty million) media files can't be utilised. Confused.png --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:14, 13 January 2019 (UTC)

Sure, technically, they could be harvested. If we, the Wikimedia Commons community, want to do that, then we could have bots do that. We could hardly do that though before captions were available. (I guess there could have been a soft launch with "captions available but invisible and only editable by bots", but personnally I don’t think this was necessary − Wikidata also launched pretty much empty, and the community went on to make bots that seeded it).
Also, I don’t think that this is the job of WMF to do it − fairly sure that had the WMF done something like that, there would have been outrage that they’re touching the content. Jean-Fred (talk) 12:25, 13 January 2019 (UTC)

Information template problem[edit]

Hi, some of my yesterday uploads to which I've added caption in Polish (via the UploadWizard form) seem to have broken Information template appearance, as you can see in this file page. Deleting the captions didn't help. Nova (talk) 10:57, 13 January 2019 (UTC)

Sorry, my mistake, noticed and fixed by Multichill [7], thanks. Nova (talk) 14:14, 13 January 2019 (UTC)

Hiding captions[edit]

Let me get straight into the point:

How do I hide the captions from my view? Any CSS code I can put on my common.css to remove that element?

Thank you. — regards, Revi 10:59, 13 January 2019 (UTC)

Found out the answer myself by looking at the above topic. For those who may be looking for this:
/** Initially posted by Mike Peel */
/** Just hiding captions */
.filepage-mediainfo-entitytermsview { display:none;}
/** Hiding "Structured Data" header */
.mw-slot-header {display:none;}
— regards, Revi 11:04, 13 January 2019 (UTC)
I thought original idea of Structured Data on Commons SDoC hereinafter (or a prototype of it I saw) was putting the SDoC section below the GlobalUsage or something like that. Is it me who has a wrong memory or has something changed thus creating this what the hell design? — regards, Revi 11:12, 13 January 2019 (UTC)
@-revi: There are two gadgets available in your preferences now - one that hides it completely as per the css, and one that collapses it by default but lets you expand it again if you want. Thanks. Mike Peel (talk) 11:48, 13 January 2019 (UTC)
Since I prefer codes on my local page rather than gadgets, thanks for letting me know but I will keep status quo. BTW, if possible, I want to force them to be H2, and moved to elsewhere (let's say, above Metadata section as prototype I recall was). Is it possible with CSS/JS? (Not asking you to do that but just wondering).
I have no interest in using it as currently is (it just sucks) but with the modification to put them away from the top of the page, it would be usable. — regards, Revi 18:20, 13 January 2019 (UTC)
@-revi: You can shrink it using e.g. h1.mw-slot-header { font-size:1.5em;}. as far as I can tell, moving it to a different part of the page is something the WMF would have to do. Thanks. Mike Peel (talk) 19:29, 13 January 2019 (UTC)
I think you can move things around using JavaScript (but because personal JS is loaded at the end, it will "jump around") − see for example MediaWiki:Gadget-CategoryAboveAll.js which moves the category box at the top. Jean-Fred (talk) 21:17, 13 January 2019 (UTC)
@-revi: captions certainly has some design issues that showed up in production that were not visible in testing. The team is working to identify the problems and push the fixes, I've made an update section that you can keep an eye on. Keegan (WMF) (talk) 17:57, 14 January 2019 (UTC)
It would be very useful to make a list of what was missed in testing, so that the problems we had here are less likely to be replicated in the future. - Jmabel ! talk 22:32, 14 January 2019 (UTC)
Oh of course, lists are being compiled and things will be shared. A majority of the issues following the release are bugs/design flaws that can only be surfaced from release into production; in most all software development there are limits to testing environments trying to replicate a live environment, with highly customized configurations, skins, javascript, css rules, abuse filters, etc. that come with all of the wikis.
All these excuses aside, a new testing environment that will be better at finding things has been in the works for the past few weeks, and should be up and live in time for depicts and other statements testing in a couple of weeks or so. In theory, it will be very helpful in reducing bugs in production. Keegan (WMF) (talk) 00:01, 15 January 2019 (UTC)

UploadWizard and captions[edit]

Hi, I just wanted to follow the new idea and do my best with the new uploads - there are few issues from this exercise:

  • current explanation in the form for the Caption is not enough - how it is different from Description? What IS most important and for whom? I seek for a link to a page with a good set of examples;
  • the section Caption is translated into Polish as "Podpis", which is a bit confusing, as suggests more a "Signature";
  • repetitiveness - first - Title, than - Caption, than - Description. All of them containing the same subset of information. In my case, with Description containing mostly one or two sentences (close to 255 characters), Caption could be the same (but two separate fields suggest that shouldn't?). Now I have to fill in 5 fields, if, as usual, two languages included; Some kind of auto-generated-pre-filled-suggestion-from-Description would be of much help;
  • Caption is marked as Optional, but put above the required Description;
  • There is no info that wiki markup should not be used in Captions and, as far as I've checked, it is not validated if it has been used, so published with the markup, than uninterpreted on the file page.

It rather stops from providing the Captions with upload at the current state. Nova (talk) 12:35, 13 January 2019 (UTC)

Captions should probably be with the "other information" if anything. Nemo 15:23, 13 January 2019 (UTC)
Thank you for the feedback, it's very helpful as the team looks at design changes. Keegan (WMF) (talk) 17:53, 14 January 2019 (UTC)

Captions updates to come[edit]

The development team is putting together the plan for changes needed for captions, there are a few bugs and some design issues that showed up when captions went live on Commons (and thank you to all who have pointed them out onwiki and/or participated on Phabricator). I'll have a list to post later this week, along with information about how soon we can expect to see the changes. I'll be making the post here, with a note on the Village Pump. Keegan (WMF) (talk) 17:51, 14 January 2019 (UTC)

Can better sitelinks help with structuring data?[edit]

I've opened a proposal at Wikidata at "Wikidata:Wikidata:Requests for comment/Proposal to create a separate section for "Commonswiki" links" in relation to linking to galleries and categories on Wikimedia Commons, I also left a field there open for other ways how Wikidata could help Wikimedia Commons with its structure, will the structured data for Wikimedia Commons project be able to utilise such links or does this project exclusively work with the files and not the existing community-made infrastructure? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 15:59, 15 January 2019 (UTC)

Hashtags (#)[edit]

A lot of image-sharing websites use hashtags (#) to clarify what is depicted in an image, is the concept of "depicts" going to be like this? Because I think that adding hashtags could very easily be a good search 🔎 tool, let's say you want to find an image with both "#Cats" and "#Birds" then a specific search could look for images where both of these hashtags are used. Sure vandals could wrongfully tag images with "#Hot sex" and "#Nude female human" and actual images that depict hot sex and nudes could be just as well be vandalised with "#Children". But from what I can tell "depicts" will be vulnerable to the same levels of vandalism. Hashtags could be listed below the categories on the bottom of an image, and being placed in a category could automatically add certain hashtags to an image, these "automatic hashtags" are then non-editable in the same way maintenance categories associated with certain license templates are. Is this idea viable or close to how "depicts" will work?

Because this is already how a lot of successful media-sharing websites do it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:26, 15 January 2019 (UTC)

Hashtags are not being implemented. Here's the last round of designs of what depicts will look like; it's very much about structuring data. Can someone add incorrect or malicious entries into structured data? Yes, but in no different way than the rest of the wiki functions. Structured data is fully integrated with recent changes and the revision deletion extension that controls page deletion, revision deletion, and suppression (aka oversight), as well as abuse filter. The development team plans on having depicts statements up for testing in a new test environment by the end of the month, so you'll be able to see it in action. Keegan (WMF) (talk) 22:17, 15 January 2019 (UTC)

Captions Vs. Descriptions[edit]

Alright, I really do not want to sound daft or anything, and I've already heard read an explanation, but I genuinely can't tell the exact advantage of "captions" over "descriptions", I get that "descriptions" are in WikiCode and can't be used with the infrastructure of the new "Structured Data", but other than character limitations I don't exactly understand how "captions" help with structured data, do they allow for data-mining ⛏ to automatically create "depicts" based on them? How exactly do they organise files in ways that "descriptions" don't? When I search for a file the file descriptions are also searched, so how do file captions improve upon this? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:40, 16 January 2019 (UTC)

Also, descriptions can be multilingual as well, the way adding different languages is done using the MediaWiki Upload Wizard is identical, how are the languages of file captions better organised than those of "descriptions"? I've seen the new upcoming designs and they all look great, but I still don't see where the advantages of "Captions" come to play. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:44, 16 January 2019 (UTC)

I think the next set of releases for structured data, depicts support followed by support for other properties, will help illustrate how captions fits in with the coming features, particularly on the back end of the software. Search with captions and statements, for example, will be an entirely different experience and it's very hard to illustrate that right now. The next question might be why wasn't captions released with this other stuff, then? The answer to that is the underlying technology behind captions that will power the rest of structured data - namely Federated Wikibase and multi-content revisions - is brand new and had to be integrated into Commons first, before we can finish development and release of the next feature set. Keegan (WMF) (talk) 22:35, 16 January 2019 (UTC)