Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to: navigation, search

It's alive![edit]

Hey folks :)

The Wikidata team has been working on structured data support for Wikimedia Commons for a while now. A lot of work had to be done in the background to make Wikibase (the software powering Wikidata) ready for this. We've made a lot of progress and can now show the very first results. What we have so far is a new entity type (next to item and property) called mediainfo. Mediainfo entities will hold the data about a media file.

We have set up a demo system with the current state and will update it as we make progress. The current state is still extremely basic but I'd rather show you progress very early than when it is all done and can't be changed anymore. We are still quite a bit away from a deployment to Commons.

What works so far:

  • You can upload a file and you will get a link to the associated mediainfo page.
  • You can click the link to the associated mediainfo page. At this point it does not exist yet in the database but you will see an empty "virtual" mediainfo entity. When you add a label or description it is properly created in the database.
  • You can add statements to existing mediainfo entities.
  • You can create and edit the mediainfo entity using the same API as for items.

What doesn't work yet:

  • A lot :D
  • You can't create the media info entity by adding a statement. It will fail.

What we will work on next:

  • Make the creation of the mediainfo entity work when adding a statement first. (smallish amount of work)
  • Make it possible to use the items and properties from Wikidata in the statements. (largish amount of work)
  • Integrate the mediainfo entity in the file page directly so you don't have to go to a different page to view the data. (huge amount of work)

You can find:

There is still a lot of work to do but we're making progress towards helping Commons store structured data \o/

Cheers --Lydia Pintscher (WMDE) (talk) 15:59, 28 July 2016 (UTC)

Thanks Lydia, looks great. May be be careful discriminating between the actual position of the object and the camera position, we use different templates for these on Commons.--Ymblanter (talk) 16:16, 28 July 2016 (UTC)
That is actually a good point. We could in the future do this with two properties or use qualifiers. It's either way flexible enough to accommodate that. --Lydia Pintscher (WMDE) (talk) 16:18, 28 July 2016 (UTC)
Thanks to everyone who worked on making this possible. The future of Commons looks awesome. Léna (talk) 16:17, 28 July 2016 (UTC)
Thank you :) --Lydia Pintscher (WMDE) (talk) 16:18, 28 July 2016 (UTC)
Same as Léna − thanks for the work :) Can’t wait to explore the possibilities further! :) Jean-Fred (talk) 16:24, 28 July 2016 (UTC)
+1. The future is here, and it is called mediainfo. Wittylama (talk) 16:25, 28 July 2016 (UTC)
Splendid news, thank you. First impressions are good, and I look forward to using it in production, However, one thing struck me immediately - there's no thumbnail of the image, on the data page! I trust that a solution for this is in hand? Andy Mabbett (talk) 16:31, 28 July 2016 (UTC)
It will become one page in the future so there should be no need for a thumbnail then. I'll see how hard it is to add one in the meantime. --Lydia Pintscher (WMDE) (talk) 16:32, 28 July 2016 (UTC)
\o/ Rama (talk) 16:36, 28 July 2016 (UTC)
@Lydia Pintscher (WMDE): Is there community consensus to put this stuff on the file description page directly? --Steinsplitter (talk) 16:49, 28 July 2016 (UTC)
Would you rather have it as a separate page in the future? Please also keep in mind that this is very early. There will be many design and functionality changes still before I consider it ready for Commons. --Lydia Pintscher (WMDE) (talk) 16:52, 28 July 2016 (UTC)
Likely. I posted a link on COM:VP for lager community input regarding this :-).--Steinsplitter (talk) 16:55, 28 July 2016 (UTC)
Personally, I would strongly prefer to have the structured data fields directly on the file page. After all, structured data will most likely replace most of our current Information template. --Sebari (talk) 21:43, 28 July 2016 (UTC)
  • I was slow to understand what is happening here and I had to look at this for a while before I got it. Lydia already explained this, but for anyone who needs the same thing explained again in another way, here is my attempt: Previously, file descriptions on Commons were free text which was stored in templates on Commons. Now to a limited extent, and coming more in the future, there are options to replicate all of the file descriptions in Commons in Wikidata. This means that instead of filling in multiple fields in a template, the same information can be put into Wikidata. When information is in Wikidata a user completes any number of single form fields, rather than working with the file description as one free text form. To see an example, consider this lighthouse photo. Note that there is no published file description in the file, and obviously, Commons needs to have these. But instead of the file description, one can click through to the MediaInfo listing. In that listing, all the information which a Commons file ought to have is there as structured data, which is obviously the best way to store this kind of information. In the future, this MediaInfo data will be published on Commons file pages, and also, there will be more options to enter more MediaInfo data for any file. Blue Rasberry (talk) 17:49, 28 July 2016 (UTC)
If you want information about a painting from Wikidata to show up in the file page of the photo of that painting here then you need to adapt or create a template here. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
This is a great news. Congratulations to you all for this step and looking forward to all the possibilities that this line of work opens. :) --LZia (WMF) (talk) 21:27, 28 July 2016 (UTC)
Very good to hear, Lydia, this has the potential to greatly enhance Commons. One question: Will we be able to use Wikidata items directly in statements or will we have to duplicate Wikidata items on Commons? --Sebari (talk) 21:45, 28 July 2016 (UTC)
That's a good question. In the example image, it has a "depicts" field with "Lighthouse", and a new item Q3 was created for "Lighthouse", which doesn't at present link in any way to the Wikidata equivalent entry. More likely in practice you'd want the item to be "Poolbeg Lighthouse" which would be an instance of Lighthouse, so you'd end up building an entire topic hierarchy, matching Commons categories and Wikidata items. --ghouston (talk) 01:11, 29 July 2016 (UTC)
The idea is that it'll use the items from Wikidata. Making that possible is still quite a bit of work but it is what we are going to work on next. I have only created the items on the demo system now to show where it is going. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
It's also creating an item Q2 for the photographer. Such items will find a match in Wikidata in some cases, but not others, depending on whether the photographers meet Wikidata's notability requirements --ghouston (talk) 01:43, 29 July 2016 (UTC)
Yeah we will create a new data type that lets you link to an item on Wikidata, a Commons user page, a flickr user page and more to cover that. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
Wow, this is awesome, Thanks, -- Bodhisattwa (talk) 23:13, 28 July 2016 (UTC)
Very good news and a great motivation for all the users that work on nowadays wikitext metadata of files to be more structured by using templates for all types of information. This way we will once be able to transfer these data into the new system. Cheers, --Arnd (talk) 09:09, 29 July 2016 (UTC)
Awesome!! I'm giddy with excitement :D Sending many good vibes to the team for the next steps. Spinster (talk) 19:05, 29 July 2016 (UTC)
Excellent! Can't wait for the next steps :-) Raymond 20:02, 29 July 2016 (UTC)
Belated congratulations to everyone who made this possible, this is going to be real big :) --DarTar (talk) 18:43, 30 July 2016 (UTC)
@Lydia Pintscher (WMDE): So you plan to replace the filedescription pages completely? What happens with custom licenses, costume user templates, costume information templates etc.? Will the content be stored here on commons or on wikidata? --Steinsplitter (talk) 16:13, 31 July 2016 (UTC)
If I understand it correctly, the structured data will be just another section on the file description page, similar to EXIF data. At least I hope that is the case. In that case we could use that data in our trusted, old Information template and we could have a default {{Structured License}} template that uses the license information from the structured data. So an override with a custom template would still possible. --Sebari (talk) 16:32, 31 July 2016 (UTC)
I don't have all the answers yet. What I do know is: The data will be stored here on Commons. The data should be in the file description page. The existing stuff in the file description page needs to live on at the very least for the (probably long) time of migration. --Lydia Pintscher (WMDE) (talk) 17:10, 31 July 2016 (UTC)
This all sounds great @Lydia Pintscher (WMDE):! There is tons of interest for this, and as always let me know how we can reach out and help. Astinson (WMF) (talk) 14:28, 3 August 2016 (UTC)
Extremely enthusiastic, and curious about how to proceed. In which ways could we take part in the development or experimenting? Best, Susannaanas (talk) 10:25, 4 August 2016 (UTC)
The best thing is to play with what is there already and see if that is going in the right direction and if you can model what you want to model. I'm also always grateful for weird edgecases that we might overlook. --Lydia Pintscher (WMDE) (talk) 13:17, 4 August 2016 (UTC)
License and copyright fields can be tricky. What would you do with an item in the public domain, since public domain isn't a license? Can you specify the license / public domain status / Freedom of Panorama status of an object depicted in an image, as well as a license for (say) a photograph itself? Can you deal with compound license tags like File:A Safe Escort (18850019652).jpg, which uses {{Licensed-PD-Art|PD-old-auto-1923|cc-by-sa-2.0|deathyear=1927|attribution=[ Leonard Bentley]}}? --ghouston (talk) 01:44, 5 August 2016 (UTC)
We have done some modelling based on difficult cases we could find together with Stephen from WMF's legal team and they all seem possible to solve with the means we have in Wikidata. But feel free to do some modeling yourself on the test system. You can create items and properties there as you want. I created a license property but it could be called differently for example. It can also have different values and it can have qualifiers to give additional information. --Lydia Pintscher (WMDE) (talk) 13:39, 5 August 2016 (UTC)
I think it will be possible, but will require the use of several fields in some cases, as well as a template that examines which fields are present and gives the information for each license and how they interact. That template could be quite complex. --ghouston (talk) 03:22, 6 August 2016 (UTC)
Great news, my (late) congratulations! Thanks to Blue Rasberry for the explanation. In general, there are a lot of things to fix with regard to Commons. But it is a very good feeling to see progress in this difficult and sensitive core of the wiki. Ziko (talk) 23:21, 5 August 2016 (UTC)
Neat! --MZMcBride (talk) 23:33, 13 August 2016 (UTC)

T120451: Allow categories in Commons in all languages[edit]

Do I understand correctly that this project intends to handle T120451? If so, please state it on the report and clarify that work against this feature request has no intersection with the work on page content language. Nemo 08:24, 5 August 2016 (UTC)

Multilingual categories might be one side-effect of the work we are doing for structured data support for Commons. But I don't yet know how this will all work exactly so can't give more details unfortunately. --Lydia Pintscher (WMDE) (talk) 13:43, 5 August 2016 (UTC)
with multilingual cats sort keys have to be considered too--Oursana (talk) 23:57, 13 August 2016 (UTC)

October 2016 Consultation[edit]

Please share you thoughts about the project plan shared at Commons:Structured data/Overview. In particular, we are seeking feedback on the questions described in this section of the Overview. Please create subsections on this page. We look forward to your discussion. Astinson (WMF) (talk) 01:02, 26 October 2016 (UTC)

This is NOT a new project proposal, comments here should relate to the October 2016 update from the WMF on possibly expediting the project. — Preceding unsigned comment added by Seddon (WMF) (talk • contribs) 11:43, 26 October 2016 (UTC)


Why do we need your comment? How can you help? section had some questions. I copied them below:

Do you see this expedited roadmap as a worthy undertaking[edit]

  • yes --Jarekt (talk) 02:10, 26 October 2016 (UTC)
  • Yes. - PKM (talk) 22:55, 26 October 2016 (UTC)
  • Yes --John Cummings (talk) 12:19, 27 October 2016 (UTC)
  • No. Before enabling new stuff please fix the existing one. --Steinsplitter (talk) 11:23, 27 October 2016 (UTC)
@Steinsplitter:Could you specify what you would like to have fixed? ChristianKl (talk) 08:53, 29 October 2016 (UTC)
Just look at the open bugs at phabricator. --Steinsplitter (talk) 18:30, 31 October 2016 (UTC)
  • Yes. --Denny (talk) 20:02, 27 October 2016 (UTC)
  • Yes. Multichill (talk) 21:35, 27 October 2016 (UTC)
  • Yes. ChristianKl (talk) 08:53, 29 October 2016 (UTC)
  • Yes. --Micru (talk) 23:14, 29 October 2016 (UTC)
  • Yes!! Susanna Ånäs (Susannaanas) (talk) 07:16, 31 October 2016 (UTC)
  • The sooner the better. YES. Spinster (talk) 08:52, 31 October 2016 (UTC)
  • Long overdue, imho. --El Grafo (talk) 13:21, 31 October 2016 (UTC)
  • Yes. I am pleased to see a move to bring Commons into the 21st century. A file-repository never worked well with Wikipedia software, particularly the mess that is categories. Go for it! -- Colin (talk) 13:38, 1 November 2016 (UTC)
  • Yes, I endorse the high level concept. I am not aware of anyone saying how much this would cost, who would take responsibility for management, what minimal promises can be made for any level of investment, or how this compares with other options. If there were a choice of options to support which each had a price tag then I am not sure which among the high-level projects I would prefer. I hope that no one interprets this poll as a community preference in favor of other options. A lot of other things might be done with Wikidata preferentially and at lower cost, and I am not sure who is making decisions or how. I can only guess that this project would be among the most expensive, complicated, and risky directions for development. The risk is going over budget and time for subpar delivery, and I would like to see some solid unqualified successes out of the WMF. Structured data in Commons is something that I dearly want, among other things that I dearly want. Blue Rasberry (talk) 16:43, 1 November 2016 (UTC)
  • Yes, but Steinsplitter made a good point above. There are important issues which need to be fixed for a long time. On the top of my head: 1. interwikis with galleries and categories to the rest of Wikimedia, 2. using WD for authors (Creator pages, etc.). Yann (talk) 18:15, 1 November 2016 (UTC)
  • Most certainly yes. It's really good to hear that efforts are now being made to look seriously at some of the fundamentals of the Commons software for the first time in a decade. MichaelMaggs (talk) 12:47, 8 November 2016 (UTC)
  • Yes. Make it happen! --Beat Estermann (talk) 13:27, 10 November 2016 (UTC)
  • No. Frankly, the effort seems confused and backwards. I look at and see an instance of a photograph. I'd expect the photograph to have a Q-number and I'd expect the descriptors to hang off that Q-number. I'd expect it to be an instance of a photograph. I'd expect it to have a subject that is the Poolbeg Lighthouse. Instead there's a new type MediaInfo and stuff hangs off of it. For licensing, I expect there to be a generalized author for copyright purposes, but instead there is a photographer. So the photo doesn't have a photographer but the MediaInfo does; the property attaches to the wrong object; the photographer did not take a photo of the MediaInfo. There's no statement that the media is a photograph. There's MediaInfo that can be ripped from the JPEG metadata, but it is less accurate than the data in the JPEG (eg, date taken). The MediaInfo label is Poolbeg Lighthouse, but the object is not the lighthouse but rather a photograph of the lighthouse. It depicts a "lighthouse", but it should be depicting Poolbeg Lighthouse (Q7228600) (an instance of a lighthouse Q39715 which is a tower which is an architectural structure ... so dig deep to find a 3D object and know that panorama is an issue). The MediaInfo approach is a contorted view of a simpler problem. The basic world is simpler. Look at A Connecticut Yankee in King Arthur's Court Q848612. Copyright status falls out of that Q-number. It's an instance of P31 a book that was published P577 in 1889. The author P50 is Mark Twain who died P570 in 1910. I'll use country of origin P495 to infer publication was in US. That gives me PD-old-auto-1923|deathyear=1910. All done outside of the MediaInfo type. See also Moonrise, Hernandez, New Mexico (Q17107995) which has more complicated issues; individual prints of the photograph are significantly different. Far from expediting, I think somebody needs to pull the MediaInfo plug. Glrx (talk) 01:51, 21 November 2016 (UTC)

What roadblocks, risks or challenges do you anticipate with accelerating such a project[edit]

  • I do not anticipate that you ever going to get rid of text version of some page descriptions. We had the same problem several times before:
  • Early images did not require infobox, like {{Information}}, so for years people were adding images without them. We put a lot of effort into creating and maintaining core set of infoboxes and unifying hundreds of other rarely used infoboxes and description templates into them. We also put a lot of effort into adding infoboxes to files lacking them; however ~1% of files (in Category:Media missing infobox template) is still missing them. I expect they will be still missing years from now. The reason is that the remaining files mostly require manual processing an that is a very boring task that nobody is lining up to do. Also many of those files do not meet current standards of documentation and end up with mostly empty fields and some are randomly deleted for lack of source or author.
  • We had better luck some years ago with enforcing the rule that all files require a template with a copyright tag. It was a massive job to add license templates to all the files that never had them or lost them.
  • Another perpetually unfinished task is transfer of files from wikipedias to Commons. main issue is impossibility of automatically converting from one format of wikitext based description to another. So it has to be done automatically while dealing with missing data and frequent deletion of old images for lack of current day metadata. I expect that we will get 80-90% done but the remaining files will be with us for a long time. --Jarekt (talk) 02:10, 26 October 2016 (UTC)
  • Another challenge we run into are stuborn users that do not like their files moved from wikipedia to commons, or do not like {{Information}} template and will engage in a war with anybody that adds standard infoboxes, or do not like any of the standard license templates and write their own text of a license. Same users might be creating succesful roadblocks at using wikibase type descriptions. --Jarekt (talk) 02:10, 26 October 2016 (UTC)
    • perhaps the stubborn users object to their items which were stable on wikipedia for years, but are then transferred and deleted on commons. we might also talk about the stubborn admins on commons who do not play nice with anyone including wikidata. Slowking4 § Richard Arthur Norton's revenge 19:30, 31 October 2016 (UTC)
      • @Slowking4:, I agree. Sometimes people transfer files from Wikipedia, relying on tools which are very bad at keeping all the image metadata. Than images are deleted from Commons due to insufficient metadata without notifying the photographer, only the user or bot that did the transfer. It is a maddening, but unfortunately not uncommon. I fully support users that that happen to to be "stubborn" about future moves by others. However the safest route is to move them yourselves. I also agree that being admin on Commons, or any other project, does not inoculate from being stubborn or difficult. I can show many examples. The best remedy is to nominate people who are not stubborn or difficult for the job. --Jarekt (talk) 16:50, 1 November 2016 (UTC)
  • Final challenge I see is treatment of files that do not use {{Information}} template but one of other infoboxes or templates derived from them.--Jarekt (talk) 02:10, 26 October 2016 (UTC)
This photograph of thinker pondering the Beatles should require information about sculptor and the photographer and list copyright tags for both.
  • Another challenge and opportunity would be capturing messy details of multiple licenses which apply to different jurisdictions and my be related to multiple co-authors. For example a photograph to the right should require information about sculptor and the photographer and list copyright tags for both. Many images require information about copyrights in the country of origin and in the US, and may also include information about copyrights in other countries. All that complexity is not well captured using current templates (See Commons:Multi-license copyright tags for info on copyright templates), but could be captured in well designed wikibase structure. I think we could automatically migrate many of the current files to such system, but probably not all. I have however high hopes on capturing such complexity with the uploads we contribute years from now. Finally I hope that being able to capture more details will not add to higher confusion during upload and will not lead to mass purges of old non-complying files that meet community standards at the time of the upload, but might not be meeting future standards. --Jarekt (talk) 13:07, 26 October 2016 (UTC)
    @Jarekt: This is all really great feedback! Having realistic expectations about the speed of adopting structure, is really important. We are trying to calibrate the proposal not to overpromise the conversion, but are cautiously optimistic. Of course, working to build a reasonable community process for prioritizing and supporting that transition will be really important. Astinson (WMF) (talk) 16:01, 1 November 2016 (UTC)
  • Apart from all the challenges regarding migrating existing files, what about uploading new ones into the new system? Many experienced users prefer alternative upload methods (see Commons:Upload tools) over the default UploadWizard. Please make sure you don't underestimate the disaster breaking those tools would cause! Forcing people to use the UploadWizard will result in angry mobs with torches and pitchforks. I think it's crucial reach out to the developers of those upload tools as early as possible to give them ample time to adopt. --El Grafo (talk) 14:09, 31 October 2016 (UTC)
    @El Grafo: We agree with you completely: One of the Year 2 priorities, is going to be working with communities on uploading and other important tools, especially those which are already designed to work with structured data in some way (for example, if you haven't engaged with Commons:Pattypan yet, I would recommend giving it a try (it makes the infoboxes much easier)). Wikidata was hugely successful because of the community of volunteer developers, the plan is to apply that learning to this project. Astinson (WMF) (talk) 16:01, 1 November 2016 (UTC)
    @Astinson (WMF): Thanks, that's great to hear! --El Grafo (talk) 14:30, 8 November 2016 (UTC)
Volunteer tools were such a big success for Wikidata, because we could use Wikipedia as data source. The biggest data source were Wikipedia categories and Wikipedia templates. I guess the most important data source for Commons will be the category system in commons. Other possible data sources will be the name of the files and the description of the images. A possible new tool could be an image recognition software for bots like pywikibot to detect if the picture is about people, animals, houses, etc. There is Commons:Bots/Work requests, but it is not very active. --Molarus (talk) 06:59, 4 November 2016 (UTC)
  • not a chance of upload wizard only. the challenge is how do we guide new uploaders to appropriate tools, since they are open source, (changing support) and harder to find. by default upload wizard with small off-ramp, you get a lot of information template to change to artwork. and questions at village pump. need a dashboard for uploaders to select right one; need a tool life-cycle, with on-boarding at WMF for the good ones. Slowking4 § Richard Arthur Norton's revenge 19:40, 31 October 2016 (UTC) — Preceding unsigned comment added by Astinson (WMF) (talk • contribs) 16:19, 01 November 2016 (UTC)
  • Handling the social/community aspects will be critical. Technical changes - especially after such a long period of technical stagnation on Commons - will inevitably annoy some people, and we know that long intemperate discussions on wiki significantly put off volunteers who might given a more positive environment be very happy to help out. That's one reason why as Jarekt mentions above it can be difficult to crowdsource volunteers for new cleanup tasks. In parallel with discussions of technical changes we need to make sure our Commons policies, guidelines and legal rules are up to the task. Commons currently has relatively few formal policies/guidelines, and one consequence is that for many issues the community has not yet worked out what its 'official' consensus view should be. Where rules need to be added or changed early community discussion is going to be really important, with a strong emphasis on encouraging cross-wiki and cross-community collaboration. MichaelMaggs (talk) 13:35, 8 November 2016 (UTC)
  • +1, I'd go so far as to say that this might very well be the most difficult part of the whole project. --El Grafo (talk) 14:30, 8 November 2016 (UTC)
That's why we should start by small steps: automatic handling of institutions and creators seem to be a good idea. Once that work, we can shift to bigger issues. Regards, Yann (talk) 17:41, 8 November 2016 (UTC)
  • Poor design implies disaster. Glrx (talk) 01:55, 21 November 2016 (UTC)

Does the current project accurately represent the role of the communities, especially the Wikimedia Commons and Wikidata communities, in engaging with such a software project?[edit]

  • That is a hard question. On commons many of the most active users find their niches and work there, for example many hardworking admins work on keeping up with daily flood of copyrighted images that need to be deleted. Often current self appointed tasks do not allow easy switch to new tasks, so it is much harder to crowdsource a new set of tasks. For example when we finally identified all the files without a infobox by adding them to Category:Media missing infobox template there was not army of volunteers to fix them. Similarly when Wiki Loves Earth competition dumped over a month few thousands images with bad coordinates it was hard to find volunteers to fix them. On the other hand Wikidata grew a large community of volunteers to run the site doing tasks nobody thought of years ago. I do not expect them to drop their current work and come to Commons and work on our structured data, but I do hope we can grow our own community of volunteers to work on our new tasks. Those could be users that do not like the ways we do image description or categories currently on Commons. --Jarekt (talk) 13:22, 26 October 2016 (UTC)
  • No. I have attempted to identify who the "we" is by reading the overview, but only managed to work out that it was not WMDE, the WMF or Wikimedia Commons volunteers (though it could be that "we" is the WMF and WMDE, just talking about themselves like third parties). I can deduce two names of WMF employees based on the posts being made on-wiki, but it would probably be wrong to presume that's the whole team. In terms of engaging, that's an interesting point, as the example image groups being targeted in "How Commons Content Could Change" includes a lot of uploads from my projects, but nobody has approached me with practical analysis on how my uploads might need to be adapted, for example where I have applied project specific ingestion templates to hundreds of thousands of images, such as {{nypl}}. My assumption would be that engagement will remain passive, consisting of posted invites suggesting volunteers to comment now plans and proposals have been published. Experience shows that volunteers that invest their unpaid time creating suggestions for changes and improvements will be resisted or politely sublimed by the late stages of proposals like this. In summary, I might be interested in helping the transition to structured data, if it was explained in a way I understood rather than way-laid with jargon that does not seem pinned on measurable definitions, and the plan (or timeline) showed there was anything that I could look at from a pragmatic Commons perspective before 2019. From what I've read, I think this is all about Wikidata until then. -- (talk) 11:17, 27 October 2016 (UTC)
    @Fae: The "we" in the document is the WMF and WMDE group scoping this work which includes staff across both organization's engineering units; Seddon and I are providing community engagement support, because the project could include external funding (in the scope of WMF Major Gifts) and is closely related to movement work in the GLAM-Wiki space (in the scope of WMF Programs team).
    Additionally: thanks for the feedback on needing a more pragmatic timeline: at this time, we are offering a high level, because whoever is leading on this project (someone like User:Lydia Pintscher (WMDE) is for Wikidata), will be working directly with the community for both prioritizing features to implement and the order/way in which community activity might change (making sure that available infrastructure changes alongside community consensus). The current focus is on investing in the infrastructure, how the infrastructure will be used by the community will be dependent a lot on those further conversations and is a major component of the timeline -- we don't want to prescriptively propose a means of implementation before the demo is ready for demonstration, and the Commons community has had time to evaluate it and provide feedback. Astinson (WMF) (talk) 13:58, 31 October 2016 (UTC)
  • No. I never seen a community consensus pursuant to COM:RFC. --Steinsplitter (talk) 11:24, 27 October 2016 (UTC)
  • No. I don't see clear goals or clamor for particular features. Put the data in a database rather than in text is a solution, but what is the problem that is being solved? It cannot be just to move data around. The project does not state goals but rather "benefits". Those benefits are motherhood-and-apple-pie statements rather than concrete goals. "One way to think of structured data: It’s a kind of DNA that explains information in a much more integral way." What does that mean or solve? DNA is instructions for assembling proteins; it tells us nothing about what those proteins do. Bad metaphor. It's a sales pitch in the aether. Lots of author information on Commons is dead wrong; moving it to a database isn't going to fix that. Today I corrected the author for File:Rhamnus frangula - Köhler–s Medizinal-Pflanzen-120.jpg. All of the credits in the Köhler's Medizinal-Pflanzen are wrong: Koehler wrote the book, but others did its artwork. If we knew that Koehler is not an illustrator, then we could deduce that he is not the author of a drawing. We can use a database for checking. But I do not see that as a stated goal. What are the goals? How will those goals be achieved? Glrx (talk) 02:14, 21 November 2016 (UTC)

How would you like to support this project?[edit]

  • The GLAMpipe metadata transformation and upload tool would be an ideal tool to support working with this framework. I will commit to developing it.
  • I am interested in working to streamline licensing procedures, and the way how media, licenses and metadata are represented in the MediaViewer, file page or shared content.
  • I am also interested in contributing to "templates", metadata subsets that are required for specific types of media. I have worked with the Map template, discussing broadly with media providers and reusers, and I can specifically contribute to that.
  • I would like to see participatory methods of developing these, meaning I as well as many others would like to be engaged in discussion and design. However, as pointed out elsewhere, this is a long anticipated development, and it should not be set on a track that could get it jammed. Therefore, I opt for inclusive, forward-thinking and productive setups. The priority is very high, and should be recognized.
  • When in place, I would like to contribute to developing ways to enrich data: Adding location, connecting to additional data, making annotations, recognizing features - in micro tasking applications like the Wikidata Game and in regular MediaWiki tools

--Susanna Ånäs (Susannaanas) (talk) 07:38, 31 October 2016 (UTC) (edited 14:34, 31 October 2016 (UTC))

Categories, tags, and navigation within Commons[edit]

Issue with maintaining intersection categories

Structured data is a huge opportunity for Commons that I've been waiting for years. I would really like this opportunity to be used to rethink what is the purpose served by Commons categories, and if there are still needed with structured data (I think they aren't). In my mind, categories have this functions :

  1. They give information. For instance, I know that everything in Category:Paintings by Vincent van Gogh are paintings, made by Vincent van Gogh. This function will always be best served by structured data, wikidata:Property:P31 : wikidata:Q3305213 and wikidata:Property:P170 : wikidata:Q5582. Structured data is better because it is multilingual and more precise at the same time.
  2. They connect (subparts of) Commons to other Wikimedia projects (and share this role with pages/galleries). This connection is both for readers (if you are interesting in this article, maybe look at our collections of images about this topic) and editors (improving a wikidata item about someone ? We might have a picture of their tomb). Structured data would help this by having more meaningful results (compare Category:Paintings by Vincent van Gogh, which has only subcategories and a handful of low quality files, to the appropriate SPARQL request. Instead of entry points based on manual curation, which can be explicit (pages such as Vincent van Gogh) or implicit (by adding "Category:Vincent van Gogh" in a file), we could have dynamic entry points, defined as SPARQL requests. (They could be updated by bots every day, or dynamically generated for each reader, or any solution that is cost effective).
  3. They creates path of navigation within Commons. This is their most overlooked job, and it feels like we, the Wikimedia movement, forget that people might just want to look at pictures of a place without reading an encyclopedic article about it or looking for travel advices. Categories, with their rigid inclusion semantic, don't help. Sure, if I want to see only portrait paintings by Vincent van Gogh, there is Category:Portrait paintings by Vincent van Gogh. But if I want to see paintings of flowers ? Or sunsets ? Or any combination of criteria that is not yet here (and we already have LOT of multicriteria categories such as Category:Portrait paintings by Vincent van Gogh, Saint-Rémy 1889 and yet we are very far from covering everything). With structured data, we can allow the reader to choose their criteria (paintings only in a given geographic area, or about a given topic), but also open doors of serendipity (see paintings of flowers from other artists).

Structured data is going to turn Commons into the wonder it deserve to be : let make sure we give it the full power to amaze us ! Léna (talk) 11:00, 26 October 2016 (UTC)

I agree that revamping the category system might be one of the great benefits of structured data. See also my slide with examples of other issue with categories on Commons. --Jarekt (talk) 12:42, 26 October 2016 (UTC)
In theory it's a good idea, but in practice it would depend how well it was implemented, and what kind of user interface could be created. Categories do at least work, they are fast to browse and fast to update. That SPARQL request locked up my browser for a few minutes, and if I wanted to modifying the query I'd have to spend time understanding the query language and Wikidata properties. I suspect the results would also be limited to one image per artwork, there'd be no way to display all matching files in Commons. --ghouston (talk) 00:21, 27 October 2016 (UTC)
The Wikidata notability requirements would also need to be examined. At present, it seems that a Commons category alone isn't sufficient to allow creation of a Wikidata item, and if Commons categories go away, even that wouldn't be available. What happens when you want to group images by a concept that isn't described on any other Wikimedia site, and perhaps doesn't even have reliable external references? --ghouston (talk) 00:26, 27 October 2016 (UTC)
Do you have an exemple of such a concept ? Léna (talk) 06:27, 27 October 2016 (UTC)
@Léna I have tons of such examples. In fact it is part of my workflow as a volunteer interested in 17th-century art. If I can't find the artist (or museum, or genre, or subject) then I create a category for it. Days, months or years later I might go and write an article about the person, thing, concept or whatever, and then I get around to updating the various Wikidata items involved. Sometimes I don't get farther than just Wikidata items and never bother with a Wikipedia article (such as grouping artworks in categories by collector - the collector may have an article and some of the artworks may have articles or items, but I never bother to create items or articles for the collection). Jane023 (talk) 08:53, 27 October 2016 (UTC)
I have the same kind of workflow but I usually create "correct" items and "bad" categories. For instance wikidata:Q27553312 has clear, structured, multilingual information while Category:Mission Gabriel Maget is really poor (I only created it to link it to the item). I find it more easy to express information through statements than by finding the right parents categories of the one I just created. Léna (talk) 09:06, 27 October 2016 (UTC)
Exactly - and the point is there is nothing wrong with such workflows. It is perfectly OK for someone to create detailed commons categories and not bother with Wikidata. The point is that on Wikidata we have loose definition of notability along the lines of "if it is linked to a notable item directly, then it's OK" and on Commons it's not so clear. Importing Commons categories of artists'artworks for existing items for artists is OK, but importing commons categories of artists when there is no associated item for the artist is probably not OK. Jane023 (talk) 11:27, 27 October 2016 (UTC)
I think that if you have a Commons category of a person that you can find in VIAF, LOC or other library catalog (find enough info to fill {{Authority control}} template) than it is notable enough for Wikidata. Article or no article. I think their criteria for notability is much lower than for other projects. --Jarekt (talk) 12:41, 27 October 2016 (UTC)
Examples I can think of: people who don't have Wikipedia articles, perhaps sports people or academics, where it seems worth keeping a photo of them in Commons in case they are needed some day. Random devices such as obscure models of mobile phones where there's no Wikipedia article. I'm not sure if that item mentioned above, wikidata:Q27553312, meets Wikidata:Notability, due to the clause "an item with only a sitelink to a category page in Wikimedia Commons is not allowed on main article items". --ghouston (talk) 23:57, 27 October 2016 (UTC)
  • I've been reading these paragraphs and I understand that it would be a good method to substitute categories and improve cataloging of files. That implies keeping many of the categorizations by creating Qs.
    Let's take Category:Carrer Pasqual Arbós 5, Xirivella. It's a building, not a monument, lacks any relevance but its very existance, we pictured it because not many buildings of that sort have reached our days in Xirivella. I cannot reference it in any other form than saying "go there and look". A Q for it will be needed or the information would be lost (or very difficult to find).
    I can think of odder things: Category:Water supply manhole covers in Sueca. We have found that manhole covers are a source of information and we usually photograph them. Using several properties (it's a manhole cover, it's in Sueca, it's related to water supply) can help, but SPARQL (quoting Asaf Bartov) is very difficult to use, so a better more user-friendly questioning interface is requiered. B25es (talk) 18:32, 27 October 2016 (UTC)
I assume that our categories would not go away, but would remain as a parallel way of keeping track of things. In the old days we organized files using galleries which were competing with categories. Categories won, but we still have thousands of out of date galleries nobody maintains. I think we can do the same with new system. As for SPARQL I assume that tools will be written to see all the images that meet some criteria without using SPARQL queries. For example (following my image in the sections above), if you pick tags: paintings, male subject, from France, and portraits you will get something similar to the content of Category:Portrait paintings of men of France. --Jarekt (talk) 18:59, 27 October 2016 (UTC)
In that case would it be up to the user to think of appropriate tags to restrict their search by? They'd also need some way to find out what relevant tags are available. A tag like "clock" could include a vast range of devices including single-function clocks and all kinds of multi-function devices that happen to include a clock, including practically every computing device. --ghouston (talk) 00:10, 28 October 2016 (UTC)
Exactly what I'm thinking, we would need a system of both suggestions and free navigation (not threw SPARQL requests, but for something way more reader-friendly). For instance, once you are in Category:Paintings by Vincent van Gogh, you would have a way to restrict the search (with suggestions such as "in a given museum (van Gogh museum, Orsay, other) / at a given period (on a "ruler" from 1878 to 1890) / about some topics (portraits, landscapes, still life, etc) / with given properties (copies of Millet's works)) or to extend the search (drawings by Vincent van Gogh, or paintings by other painters). Thus, the navigation would be defined "top down" : the "code" of Category:Paintings by Vincent van Gogh would swich from the bottom up
[[Category:Vincent van Gogh| Paintings]] [[Category:Paintings by painter|Gogh, Vincent van]] [[Category:19th-century paintings from the Netherlands|Gogh, Vincent Van]] [[Category:Paintings from the Netherlands by painter|Gogh, Vincent Van]] [[Category:Post-Impressionist paintings|Van Gogh]] to something like
  1. Down
    1. museums : list - van Gogh museum, Orsay, others
    2. period : ruler - 1878-1890
    3. topics : list - portraits, landscapes, still life, others
    4. filter : inspired by Millet
  2. Up
    1. Works by Vincent van Gogh
  3. Linked
    1. Paintings by other artist : list - Anthon van Rappard, Émile Bernard , Paul Gauguin
So it would be required of the sofware to have a langage that expresses these kinds of links and that this language be easily used by the visual editor. The role of the Commons editor would thus to express which parts of Commons should be linked with one another. Léna (talk) 09:59, 28 October 2016 (UTC)
Yes something like that, although I don't understand how "subcategories" would work in the new scheme. Somehow these would need to be derived from the Wikidata relationships. Then if a file in Commons was tagged with "Samsung SGH-D600", for example, would it also be found in a search for mobile phones, or for battery powered devices, or would those tags need to be added to the file explicitly? Locations are also difficult, when selecting London you'd want to include everything with geographic coordinates within its borders, as well as anything tagged with a geographical subregion such as Westminster. --ghouston (talk) 23:14, 29 October 2016 (UTC)

Maybe a Commons Category to SPARQL-query translator is needed. I mean, there are hundred thousands categories and it is not possible to write hundred thousands SPARQL-queries by hand. We would need a way to store those queries and maybe a way to cache the results. Querying 34 mio items is expensive. Maybe we would also need a software that proposes Commons Wikidata statements. The software knew what SPARQL-queries already exist and therefore could say, that your media file is similar to those media files and therefore have similar statements. For example, the software could create a list of museums that have paintings of Vincent van Gogh and the user could choose from that list, instead of searching for the right Qnumber of the museum. Second point: That Vincent van Gogh query shows about 1000 pictures. No one wants 1000 pictures to look into. We would need an assistant that asks you questions to reduce the number of pictures. That is another reason why categories are created. Third point: Without categories, where should Wikipedia articles link to? Maybe such a software decides if this project fails or succeeds and maybe the developers should start with that software, not with building Commons Wikidata. --Molarus (talk) 00:04, 28 October 2016 (UTC)

I don't know for Commons Category to SPARQL-query translation, but I'm working on a translation from Categories to statements : Commons2Data. You can have limitations on queries (for instance, display only the 50 first results). And your last point is one of my points : we need entry points from Wikimedia to Commons (and, btw, both categories and galleries are not that good entry points). Léna (talk) 09:37, 28 October 2016 (UTC)
1) Maybe a Commons Category to SPARQL-query translator could use a en:Genetic algorithm. The right SPARQL-query is found, if the query returns more or less the same pictures as are in the category. 2) I don´t think just showing the 1000 pictures step by step is the right answer, I would rather see your filter proposal as a better solution. But those filters have to be created by software from the statements of those 1000 van Gogh pictures. --Molarus (talk) 18:20, 28 October 2016 (UTC)


Commons is supposed to be a multilingual project. This statement is supposed to be true for metadata of files, as well as community discussions. However, and this page is yet another proof, discussions are actually happening in English, and expressing oneself in another langage means not being understood at best, being seen as rude at worst. With structured data, there is an oppportunity to have some kind of multilingual discussions. Of course, complex issues will still need natural language (and thus, English) but some discussions, included votes (QI, FP, and simple deletion requests) could be multilingual by using a list of predifined concepts. Léna (talk) 11:27, 7 November 2016 (UTC)

Semantic annotations[edit]

The property depicts (P180) is useful for this project, but it is also very rough. It would be nice to be able to annotate the image, similarly to Commons:Image annotations, but in a structured way instead of free text. One should also probably take a look at the W3C standard Web Annotation Data Model to make sure that open standards are being implemented. Ainali (talk) 19:37, 8 November 2016 (UTC)

Thank you so much for participating![edit]

Hello all! Your feedback and conversation at this stage in developing this potential expedited work on Structured Data on Commons is greatly appreciated.

The feedback both highlighted new challenges and helped us examine the challenges we anticipated, as well as bringing up new ideas and methods for thinking about solutions to those challenges. We are going to make sure that the team working on this project, whether or not we get the additional funding, see this conversation page; continued feedback or conversation here, will greatly strengthen their work, and let us know that you are interested in getting from that work. We will make sure to provide an update as soon as we have more substantial information about the project's potential funding source.

In the short term, we recommend also participating in the 2016 Community Wishlist Survey on Meta. The survey helps scope out a number of project ideas, requests, and other technical needs for various Wikimedia Communities. Even though the Wishlist and the Community Tech team may or may not be able to expedite work on this core technological infrastructure needed for Structured Commons, identifying specific technical needs related to it as part of the Wishlist, will help ensure that more of the developer community understand community needs on Commons.

Pinging folks that participated: @Ainali, Yann, El Grafo, MichaelMaggs, Léna:@Molarus, Jarekt, Bluerasberry, Colin, Slowking4:@Susannaanas, Spinster, ChristianKl, Micru, B25es:@Jane023, John Cummings, Steinsplitter, Fae, Ghouston:@PKM:. Please feel free to ping more folks, if I missed them. Astinson (WMF) (talk) 16:46, 15 November 2016 (UTC)

Open commenting on WMF Seeking Additional Resources for Structured Data on Commons[edit]

The Wikimedia Foundation in cooperation with Wikimedia Deutschland, has a unique opportunity to potentially secure additional resources to expedite development work on Structured Commons. We would like feedback on a 3 year plan that describes accelerated software development if these resource becomes available. We would like to invite you to participate in a conversation at: this page which provides an overview of that proposed timeline. We look forward to your comments and thoughts.

Joseph Seddon and Alex Stinson 15:24, 26 October 2016 (UTC)

i would note some of the metrics need an update, i.e. commons is not 100%. we need better tools, and team organization. some gamification would help. having a master list of files without metadata would be nice; User:Multichill/Same image without Wikidata is broken; add information tool needs love [1]; we need a convert information to artwork tool or script; some existing user categories Category:Media missing infobox template and Category:Artworks missing infobox template need to be incorporated. Slowking4 § Richard Arthur Norton's revenge 19:19, 27 October 2016 (UTC)
Sounds like a wishlist! I agree though on the gamification tools. It would be really nice to have 2 things to help volunteers interested in paintings and sculpture, which is one set of tools that work either way: the "artwork template fixer-upper" and the "wikidata item fixer-upper". Jane023 (talk) 12:03, 28 October 2016 (UTC)
they asked for it ;-p but also need to recruit ; train ; support a quality circle to improve metadata. (teahouse method) merely building a project page, and asking for comments will not get the edits done. Slowking4 § Richard Arthur Norton's revenge 15:01, 29 October 2016 (UTC)
@Slowking4, Jane023: Yeah this depends a lot on how the project is implemented, and facilitating lots of community-focused support on tools. The project plan includes that, and I invite you to keep watching and engaging this project when/if it moves forward (pending that additional funding. 18:27, 31 October 2016 (UTC) — Preceding unsigned comment added by Astinson (WMF) (talk • contribs) 18:27, 31 October 2016 (UTC)