Commons talk:Structured data/Archive 2018

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Tag namespace? or better "Category all images" namespace

I'm not sure my comment is in the scope of this discussion but I have this in my head since a long time. I think we should have a kind of "Category all images" namespace, let me try to explain at what I think.

This Category all images namespace would be only to display (not to edit) all the images in a category and in its sub-categories. A bit as the FastCCI tool option "All images", but at the difference that these are not requests with waiting time. I think to virtual categories, virtual categories automatically generated when you add/create a category that is part of our category tree. A bit as an automatic invisible over-categorisation. This namespace could be available from a category in a tab near to the discussion tab, example: Category | Discussion | All images . Of course the maintenance and diffusion categories should be excluded.

Take the same example as above, File:Canis latrans (Yosemite, 2009).jpg is in Category:Canis latrans lestes itself in Category:Canis latrans itself in Category:Canis ect ect.... until Category:Animalia.

But currently your image is only in Category:Canis latrans lestes, lets imagine now that when you add Category:Canis latrans lestes to your file then you add at the same time (automatically) and virtually Category all images:Canis latrans lestes, Category all images:Canis latrans, Category all images:Canis ect ect.... until Category all images:Animalia.

The result will be that when you will be in Category:Canis latrans, you will click on the tab "All images", and you will have all the images that are virtually categorized with Category all images:Canis latrans in summary all the images within Category:Canis latrans and its sub-categrories.

Of course I don't know the the technical method, but this should be possible if our categories are linked to Wikidata items. Again I'm sorry if I am out of topic but it's been a long time since it itches my brain. Regards, Christian Ferrer ^(talk) 19:50, 16 February 2018 (UTC)

Categories are the old system. Once data is properly structured you can just query and search in a normal way instead of having to wade through categories. Have you tried https://query.wikidata.org/ ? Just an example query of paintings depicting Carnivora:

#defaultView:ImageGrid
SELECT ?item ?image WHERE {
  ?item wdt:P31 wd:Q3305213 .
  ?item wdt:P18 ?image .
  ?item wdt:P180/wdt:P171* wd:Q25306 .
  }

Try it!

Of course the new query and search for Commons will probably much better and easier than this. Multichill (talk) 21:47, 16 February 2018 (UTC)

Thanks you. It seems that this kind of query has a brighter future than my laborious DIY. Great project. Christian Ferrer ^(talk) 22:10, 16 February 2018 (UTC)

(meta)data

"Wikimedia Commons holds a lot of (meta)data about the media files it hosts" - "(meta)data" is Q180160? --Fractaler (talk) 09:14, 19 February 2018 (UTC)

Wikibase database physically close to mediawiki database and other ideas

Based on the diagram shown, it seems like a new wikibase server could live very close to the commonswiki data. This makes a lot of sense, both from the point of view of user "commons wiki editions for commons data", but also from a technical and performance perspective- "a new image should be either transactionaly edited, both data and its metadata". I know implementation has not yet started, but I would like to leave open the opportunity to have wikibase server separate from its client and mediawiki data, in the case that the structured data will be larger than its wiki data. In particular, wikidata.org grew larger than expected- and we should be ready to separate the structured data in case it grows as successful as wikidata. Not saying it should be like that, and definitely it shouldn't start like that, but having both services developed as potentially separate will allow not running into a wall in the future. Just my $0.02.

Regarding exif, nowadays, the image table has a bottleneck with some exif content being 1G per row; it would be nice to remove it from the image table, but also I mean it as a warning of how large that could grow when stored in json format (vs. one row per property).

Please have privacy concerns in mind when deploying features- I mean those that could be used for harrasing of bringing to the spot to "non-notable" (as in non-famous) people. People have expresed in the past concern about personal details of people being written on wikidata. While better classifying and discoverabilty is something we all want "all pictures of X" or "all pictures taken by Y"; we should also be aware of way technology could either invade privacy or be used for harrasing people. I am not giving specific cases, because I do not have specific actionables, but it is something it should be on top of the list priorities "how can people misuse X functionality".

--JCrespo (WMF) (talk) 18:37, 17 February 2018 (UTC)

@JCrespo (WMF): thank you for the thoughts, much appreciated! Keegan (WMF) (talk) 20:56, 20 February 2018 (UTC)

Categories, properties, & directed acyclic graphs

I think that in many ways our existing category system is suggestive of what people have found important; on the other hand, (1) the inability to easily intersect categories has led to rather arbitrary decisions as to when "intersection categories" should exist and (2) our folksonomy results in some loops. Presumably WikiBase is better for intersecting on the fly, and would allow us to transform at least a lot of our categorization into orthogonal (truly independent) properties that themselves can then be intersected at whatever level we want. For example, place should be completely independent of date/time and of (more complicated, not a single property) what is depicted. Categories we have now, like Category:December 2014 in Seattle could simply be an intersection of a date/time falling in December 2014 and a location falling in Seattle. For each of these independent properties we would want to be as specific as possible. It also probably needs to continue to be possible to have more than one value available for a given property: for example, there are places that are along the boundary of neighborhoods (or even incorporated communities that straddle counties), but each such hierarchy should still be acyclic. Thus, if someone wanted to find images that fit the successively more specific "Seattle in the 2010s", "2012 in Seattle", "2012 in Pike Place Market", "2012 in the Pike Place Market Main Arcade" that should be achievable without any need to set these up in advance. At the same time, things we think people are likely to search for over and over -- roughly our present "intersection categories" -- could each be implemented by a fixed query, and the hierarchy in multiple dimensions could also be made available (e.g. at the presentation layer, "2012 in Seattle" could still be a subcat of "2012 in Washington (state)" "2012 in the United States by city" (a metacategory: these would require a different, but still perfectly possible, query), etc.

Does that make sense? If needed could certainly take an hour or two to flesh it out a bit, but to be honest my involvement here is not mainly in a technical capacity. That's what I do for work, and I don't want to do large amounts of it on a volunteer basis. - Jmabel ! talk 01:46, 10 March 2018 (UTC)

While homonyms will be used (how to reduce homonymy in the modeling of the world, for a long time it is known long time) when modeling the world (using categories, WikiBase, etc.), there will be problems with loops. "December 2014 in Seattle" is a homonym: 1) "December 2014 in Seattle (location)", December 2014 in Seattle (time). --Fractaler (talk) 18:17, 10 March 2018 (UTC)

Huh? It's an intersection of a location (Seattle) and a time (December 2014), neither of which is ambiguous. A homonym would be something like "models" (people who pose vs. miniatures etc.) - Jmabel ! talk 19:22, 10 March 2018 (UTC)

So, we have two sets: 1) "time", 2) "location". A superset of this set is "time + location". "2014 in Seattle by month", "December 2014 in the United States by city", "December 2014 in Washington (state)" is "location + time"? It is about "time + location" (and "time", and "location", two variables)? It is only about "time" (one variable)? It is only about "location" (one variable)? If traced to the root, transitivity is observed? We have two sets:December 2014" (subsets: 1) 2014-12-01, 2) 2014-12-02, etc., and "Seattle" (subsets: 1) Transport in Seattle‎, 2)Visitor attractions in Seattle‎, 3)December 2014 in Seattle, etc. "). A superset "December 2014 + Seattle" is about "December 2014"? "Seattle"? And "December 2014", and "Seattle"? When two sets intersect, for supersets (when we go to the root,), are they both variables or just one? Then which one ("December 2014"? "Seattle"?)? --Fractaler (talk) 18:06, 11 March 2018 (UTC)

A Time + Location category is not a superset of a time category & a location category. It is an intersection.
If we are going to use set-theoretical terminology then, to use the example above, there is a set of images (mainly, but not exclusively, photographs) dating from 2014. There is another set of images located in Seattle. Category:2014 in Seattle pertain to the intersection of these sets. The intersection of these sets is not a superset of either of these; it is a subset of both of these. - Jmabel ! talk 23:05, 11 March 2018 (UTC)

In the head, we have a (mental) world model in the form of a taxonomy, a classification where transitivity is observed: the rule subset -> set -> superset, represented by the category tree (Commons, Wikidata, etc.). This model can be implemented (linked to, illustrated) by media objects (image, sound, etc.), for example, as on Commons (knowledgebase model). An intersection (when partial, incomplete inclusion of the set) does not allow us to observe such transitivity, directed acyclic graphs. So, now we have for the set "2014 in Seattle" 4 supersets: 1) "2014 in the United States by city", 2) "2014 in Washington (state)", 3) "Seattle, Washington in the 2010s" 4) "Seattle by year". --Fractaler (talk) 08:08, 12 March 2018 (UTC)
For 2 & 3, correct. But 1 & 4 are metacategories, strictly used to structure the category hierarchy. 1 & 4 don't describe a set of files/images (only a set of categories), and if we are going to implement this from more orthogonal properties the distinction is important. The supercategories in those directions up the hierarchy are "2014 in the United States" and "[located in] Seattle", respectively. - Jmabel ! talk 16:01, 12 March 2018 (UTC)

Set-theoretical terminology has the term "metacategory"? A set can have a superset. "2014 in Seattle" is a set. --Fractaler (talk) 17:03, 12 March 2018 (UTC)

No, metacategory is a commons term, not a set theory term. In set-theoretic terms, our metacategories are sets of categories, not sets of files/images. Different domain. - Jmabel ! talk 17:09, 12 March 2018 (UTC)

If we use only the set-theoretical terminology, "2014 in Seattle" is a set? "2014 in Seattle" has a superset? --Fractaler (talk) 18:15, 12 March 2018 (UTC)
If we use only set-theoretical terminology, we can't say anything about any concrete, real-world example, because the name of any set falls outside of that terminology.
But if we take the reasonable understanding that a category is, above all, a set, then there are several of set-theoretic ways to understand "Category:2014 in Seattle". The naive way is that it is the set of all images -- I'm going to keep this to images for the moment to keep the discussion simple, since that is about 98% of our content -- dating from 2014 (or some more specific date within that year) and depicting a location in Seattle. But because our categories typically contain both images and other categories, it's a little trickier than that: our current implementation is really the union of two disjoint sets: (1) more specific "child" categories (which each may be more specific in terms of location, date, or some other property), and (assuming we are strict about COM:OVERCAT and our hierarchy is acyclic) images that match that naive description and are not in the set of images for any of the (recursive) subcategories of "Category:2014 in Seattle". One thing that distinguishes a metacategory is that no images fall directly in the category.
I don't necessarily think this way of thinking about it is all that useful to anyone other than someone trying to write/negotiate specs or to implement this. I do think we need to decide how much of our current category structure we care to retain if there is a new property-based approach to categorization. Mostly what I'm trying to do here is to sketch out how we could still give the end user any benefit they get from the current categorization scheme even if we go to a more property-based approach. Above all, what I'm arguing is that if we want to give the user pretty much everything they get from the current category scheme (and hopefully more), the possible values for any given property are going to have to constitute a directed acyclic graph.
- Strictly speaking, the rule would be a little more complicated than that: the directed acyclic graph for any given property would have a single root; the root might or might itself not be a valid property value; similarly some other values in the graph might not be valid property values, just like a metacategory cannot directly contain images. For example, we could have a property for a person's death date; the graph would looks something like:

   Root (not a valid value)
     |
     ---------------------------------------------------------------------
     |                   |                                     |
   Alive        Status unknown (could              Dead (not a valid value)........
                be dead or alive)                  |           |                  | 
                This might have some          Death date   Death date in         Death date known (not a
                descendant statuses (e.g.     unknown      a range (not a        valid value)
                'presumed dead')                           valid value)                  |
                                                               |                     (further hierarchy, however
                                                      (further hierarchy, however    we handle increasingly specific 
                                                      we handle date ranges)         dates)

Haven't though this through in enormous detail; "Death date in a range" and "Death date known" may or may not be a useful distinction, since it's not clear how precisely we have to know something to call it a single value rather than a range: e.g. if we know a year, but not more specific, is that a range or a date? How about if we know a decade? I'm not trying to solve the full modeling problem here, and we will probably want to follow some combination of how we do this in our existing hierarchies and how WikiData models it (presumably they've already thought a lot of this through, and presumably many, perhaps most, of their solutions should be acceptable ones). - Jmabel ! talk 23:39, 12 March 2018 (UTC)

Just a long remark about date. It is a categorizable attribute, in principle, like any other. First, it is possible to have an image illustrating more than one date. One kind of example are Tombstones and similar objects having a date inscribed on them – their images both illustrate objects’ ascribed dates and the current date where an image was taken. Second, I vehemently disagree that any image categorized under Seattle and produced in December 2014 is eligible to “December 2014 in Seattle” by a pure formally-logical inference. It can be, for example, a map of Seattle showing some conditions as of mid-2014 (albeit drawn in December). Incnis Mrsi (talk) 18:17, 12 March 2018 (UTC)

Certainly. And I don't think we would currently place a map that was simply produced in December 2014 in Seattle in the Category:December 2014 in Seattle, except perhaps indirectly if we had a subcat like Category:December 2014 in Seattle ''works'', which and a Wikibase approach would probably be related to using the property for Seattle in a different attribute than for images of Seattle. - Jmabel ! talk 23:39, 12 March 2018 (UTC)

Ok, then about the set "Person". For example, version 1.0 (set "2014 in Seattle" may be later) --Fractaler (talk) 11:52, 13 March 2018 (UTC)

By WP version, the set "2014 in Seattle" has: a superset "2014 in Washington (state)", a superset "2010s in Seattle", a superset "2014 in the United States by city". Commons' model: for the set 2014 in Seattle: superset "2014 in the United States by city", superset "2014 in Washington (state)", superset "Seattle, Washington in the 2010s", superset "Seattle by year":

To display all parents click on the "▶":

2014 in Seattle

To display all subcategories click on the "▶":

2014 in Seattle‎ (9 C, 11 F)

WD does not have such item. Does the same world have different models of the world? --Fractaler (talk) 07:57, 15 March 2018 (UTC)

"Structured and machine-readable format"

The phrase "structured and machine-readable format" is less clear to me. What would the format be for the images? Audio clips? Video clips? Documents Other files? George Ho (talk) 05:01, 19 March 2018 (UTC)

Also, a category tree, for example, is "machine-readable format"? "machine-readable format" is not "structured"? And the format of what (file format, content format, world model format, something else)? --Fractaler (talk) 06:43, 19 March 2018 (UTC)

Media files itself (i.e. images, audios, videos, etc) are not going to be transformed in format, so they remain as they are right now. However, all the metadata around the files itself (creator, timestamp, licenses, what’s depicted, geodata, file descriptions, technical camera information, etc.) would/will benefit a lot from being translated into a modern and performant “structured and machine-readable format”.

Currently this kind of metadata information is mostly managed with category and template techniques. As Fractaler indicates, that is already some kind of a “structured and machine-readable format”, yet it is by orders of magnitudes less performant than the Wikidata approach – both for file users and for Commons editors. Categories were never invented to organize millions of files in a project such as Wikimedia Commons, thus traversing them is difficult (and not even possible with Mediawiki functionality), and the categorization scheme is utterly complicated and in many regards incomplete and inconsistent. —MisterSynergy (talk) 07:03, 19 March 2018 (UTC)

Okay... MisterSynergy (may I re-ping you?). I'm unsure which metadata you were referring to. Are you talking about transferring data from an EXIF format or another existing format to the "structured and machine-readable format" (like a software used for Wikidata), or are you referring to translating a whole file description page into a metadata? Either way, can an IP user be allowed to edit a metadata? I'm worried that a metadata would be prone to misinformation and vandalism, especially when a "structured data" project would be established. Can the metadata be accurate about the authorship and the licensing of a file? Thanks. --George Ho (talk) 08:01, 19 March 2018 (UTC); edited, 08:03, 19 March 2018 (UTC)

Sure you may ping me.

Metadata is anything beyond the file itself (i.e. basically the content of the file description page), but I am not sure right now whether each and every piece of information should be migrated to the new system, or whether some parts remain as they are. To my knowledge, there will be a mixture of both systems (making use of advantages of both approaches), and the migration will not happen at once.

Technically users (including IPs) have the same possiblities to manipulate data as they have right now; there will also be a version history that keeps track of all changes made to the file description page, regardless of whether the change was done in the “structured data” part or in the conventional wikitext-based file description. —MisterSynergy (talk) 08:13, 19 March 2018 (UTC)

Thanks, MisterSynergy. Another question: will the project be available in all WikiEditors, including the 2003 WikiEditor (without a Toolbox), or would a Toolbox be required to enable/disable the "Structured data" feature or switch between Structured data and wikitext-based editing? George Ho (talk) 08:19, 19 March 2018 (UTC)

Phew, I don’t even know the “2003 WikiEditor”, tbh. However, I speculate that there will not be that much tool compatibility between the classical “wikitext” and new “structured data” parts, and there will hopefully not be much data redundancy either between those two. For the new “structured data” part there will probably a whole new class of tools emerge, just as we have seen it with Wikidata where the vast amount of edits is not done via the Web Frontend, but via tools (and bots). —MisterSynergy (talk) 09:13, 19 March 2018 (UTC)

To switch to the 2003 WikiEditor, MisterSynergy, you just uncheck/disable the Toolbars, so you won't see the Toolbars. However, the line spacing (aka leading) also changes, but they are working on it (phab:T181021). George Ho (talk) 15:50, 19 March 2018 (UTC)

What is a "metadata"? This is about 1) only a technical (computer, photo-video-technical, etc.) product (file parameters: file size, the author of the file creation, what the file was created, camera manufacturer, date, etc.), or 2) only an intelligent product (parameters of an intellectual product/work: the author of the work, the date of creation of the work, what the work was created, etc.)? --Fractaler (talk) 10:20, 19 March 2018 (UTC)

“metadata” is uncountable, and etymologically plural. Incnis Mrsi (talk) 11:52, 19 March 2018 (UTC)

Thanks! So, metadata (computing): "Structured information about a file (date created, creator, software used to create, last modified, file format, file fingerprint, etc)". "Structured information" = "structured and machine-readable format"? --Fractaler (talk) 12:28, 19 March 2018 (UTC)

Think of metadata as all the potential information you could read from a museum label next to a painting (every single thing the author of the label could think to put on there).

For Commons, historically, all that information was stored in paper printouts that we taped beneath the paintings. We kept manually written lists about which painting is in which gallery, but we didn't know the birthdate of the painter of a random painting without going to the gallery, finding the painting inside the building, looking at the printout, take note of the painter and hoping that the author had put the date on next to the painters name. If not, then we needed to go to library and look up his/her death date in a work of reference. So while we had a lot of the metadata (somewhere) it was badly navigable. We didn't really know what we had, where to find it reliably, and if the information was complete.
In the past, we made an effort to at least use the same type of labels everywhere (Information templates), so that the labels looked consistent and were easier to read and understand. At least they all contained the description of the painting, the name of the painter and the date it was painted.
Then we added barcodes to the cards that allow us to read all the information that was also on the card (this is "machine readable data" that you might have heard about) into a computer. The only problem is that we still need to take the scanner to the actually physical painting to scan the code to get the appropriate information. Very cumbersome....
We therefore created a database by manually scanning every single barcode in every single gallery and storing it in this database. This worked for about 95% of all the paintings. Unfortunately several of the barcodes included incorrect information that doesn't fit inside our database. For instance some of the paintings of Prince, have as the author an image of a weird symbol, that doesn't fit in our database as we only allow a text for authors not images. And there is a whole other range of issues with the information. Like; we need to send someone around all the galleries all the time, to scan the cards and keep the database up to date, it still doesn't fit all the information we have on the cards, some of the information on the cards was found to be incorrect, but we aren't allowed to change cards in the galleries and we found that some paintings are in multiple galleries (which isn't possible)... It is STILL a mess.
With structured data, we plan to turn this around. There will be one large database that has all the information. For each field, we know what kind of information should be there and what it means. We distinguish links from images and text etc. We know that an artist can be a person with a birthdate and a death date, so we can ask the database "what is the birthdate of the author of this painting", without having to store the birthdate of the painter together with every singly painting he painted. We also know that we will get back an understandable date. If someone accidentally listed the name of the painting as the birthdate of the painter, we know that this cannot be correct, because the value is is not a date. We will still have labels under the paintings, but these will be small screens that can show the information directly from the database. If we correct the birthdate of a painter, all the little screens in all the galleries will instantly show the corrected date.

A few paintings will still be an exception and for those we keep the traditional labels, but 95% will be just fine. Both the barcode and the database are machine readable data, but only the database is structured data. —TheDJ (talk • contribs) 13:40, 19 March 2018 (UTC)

I hope this helps some people understand it better. (Note this is a very rough example, all of this has many exceptions and details that are all very important for the end result, but that for clarity I happily chose to ignore here). —TheDJ (talk • contribs) 13:40, 19 March 2018 (UTC)

Documents (passport, driver's license and other human labels) were also on paper before. Does this mean that by switching to the electronic version, they become metadata? --Fractaler (talk) 14:17, 19 March 2018 (UTC)

What is metadata, for example, jpg-file now said: "This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong". So "museum label" (data, not material) is just a part of the knowledge base, where information about the label itself is not indicated (who made the label, when did it, from what material, what it did, etc.). --Fractaler (talk) 06:27, 20 March 2018 (UTC)

Wikimania Sessions / seminars - video recordings & Structured data

Can someone who has a strong opinion of how the video recordings of Wikimania 2018 seminars should be uploaded, tagged, described and linked, please get in touch with the Wikimania Video Team at Wikimania2018 Videoteam Dagelf (talk) 14:15, 21 April 2018 (UTC)

This isn't really mostly a structured data question: structured data can be added later, even if it means splitting up the files differently, which is easy enough to do as derivative versions. Because this isn't really mostly a structured data question, it might better be asked at Commons:Village pump, probably read by 5-10 times as many people as this page. - Jmabel ! talk 22:32, 23 April 2018 (UTC)

Multilingual captions prototype testing

The good news: there is an early working protoype ready for some feedback.

The bad news: it's kind of hard to get to, so I'll explain that part.

There is not a solid testing ground for new software for Commons that copies the "production" (live) environment that is here, for many complicated reasons. There is beta.commons and test wikis, but they are unstable and are not reliable when reporting and reproducing bugs. In order to build and test software for Structured Data on Commons, the team has created a special instance of the wiki on Wikimedia Labs, https://federated-commons.wmflabs.org/ . Since this is a small testing wiki without a volunteer community to patrol it, and since testing for Commons often involves uploading images, the wiki is private with account creation turned off. I know and the team knows that this is not ideal, and they're working towards more open solutions as more software is developed, but this is what we have at the moment.

Following all that, back to the good news: I have six accounts that can be used to test captions on the labs wiki. I figure if you are interested you can put your name down here. I can email you the username and password with a link to the wiki and UploadWizard there. Try it out, come back and leave your feedback here and let us know you're done, and if needed the name and password can then be sent on to someone else. Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

I'd like to test now

~~Christian Ferrer ^(talk) 17:07, 24 April 2018 (UTC)~~ Done Feedback complete. - Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
~~Steinsplitter (talk) 17:14, 24 April 2018 (UTC)~~ Done Feedback complete. Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
~~Yann (talk) 17:14, 24 April 2018 (UTC)~~ Done Feedback complete Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
~~Raymond 17:32, 24 April 2018 (UTC)~~ Done Feedback complete Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
Jarekt (talk) 18:25, 24 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
~~Juandev (talk) 18:37, 24 April 2018 (UTC)~~ Done Feedback complete. Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
— D Y O L F 77^[Talk] 20:56, 24 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 22:54, 24 April 2018 (UTC)
--Sannita - not just another it.wiki sysop 14:35, 25 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
~~Syced (talk) 08:11, 25 April 2018 (UTC)~~ { Done Feedback complete. Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
~~John Samuel (talk) 14:06, 25 April 2018 (UTC)~~ Done Feedback complete Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
Sandipan Banerjee (talk) 17:06, 25 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 17:59, 27 April 2018 (UTC)
DePlusJean (talk) 19:30, 25 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 19:19, 27 April 2018 (UTC)
Jnanaranjan Sahu [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 19:19, 27 April 2018 (UTC)
--Jonatan Svensson Glad (talk) 20:05, 27 April 2018 (UTC) [OK] - emailed via Special:EmailUser Keegan (WMF) (talk) 15:08, 30 April 2018 (UTC)
...
...

Discussion about the process

Questions, comments about how captions can be tested? Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

I imagine many already know this, but German, Japanese, and Arabic are language where we have reasonably large pools of people, and which collectively exercise most features of localization. Also, ideally, some language where we support two different writing systems for what is otherwise the same language (any particular suggestion of which? I don't know offhand what we support that way). - Jmabel ! talk 22:38, 23 April 2018 (UTC)
- For a language with different supported writing systems, you could use Serbian maybe. Wikibelgiaan (talk) 12:17, 25 April 2018 (UTC)
Is uploading at federated-commons.wmflabs.org possible via classical upload form Commons:Upload? I trust Web forms more than modern solutions. Incnis Mrsi (talk) 21:14, 24 April 2018 (UTC)
- Yes, they have there the clasical old upload form.--Juandev (talk) 22:24, 24 April 2018 (UTC)

@Juandev, Raymond, and Yann: thank you for your feedback. I'll work on getting responses to you where they are required. Are you all done with your accounts? Let me know when you are so I can send the accounts to others. Keegan (WMF) (talk) 21:51, 25 April 2018 (UTC)

Done for me. Yann (talk) 05:10, 26 April 2018 (UTC)

Done for me. Raymond 07:12, 26 April 2018 (UTC)

Done for me.--Juandev (talk) 19:19, 26 April 2018 (UTC)

@Steinsplitter, Dyolf77, Sannita, Syced, Jhalmuri, Jnanaranjan, and DePlusJean: How is testing the tool going for feedback? Do you have any questions? Keegan (WMF) (talk) 18:00, 1 May 2018 (UTC)

@Keegan (WMF): Sorry, was a bit busy :). Added feedback. --Steinsplitter (talk) 18:04, 1 May 2018 (UTC)

@Keegan (WMF): Sorry, still busy, will do tomorrow, I swear! --Sannita - not just another it.wiki sysop 08:14, 3 May 2018 (UTC)

@Keegan (WMF):

Done for me. --Sannita - not just another it.wiki sysop 12:55, 4 May 2018 (UTC)

Feedback on the prototype

This is where thoughts about the prototype will go. Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

At first view,

I wrote the caption, but now I don't know what to put in the section "Description Describe what is notable about the file." what are the kind of infos to put there? Describe what is notable about the file is not really clear to me.

When you click on " Add a caption in another language", the new section created is also in english, and you have to re-click to make appear the language choices. This is boring, make the language choice appears as soon as you click please.

for the location heading, you must chose an angle (e.g. 45°), why not also the possibility N, NNW, SE, ect....

I see that the caption is in fact the label of the item of the file, but that don't appears in the file page

Christian Ferrer ^(talk) 18:15, 24 April 2018 (UTC)

If I understood well the section description is the same thing as the current field description, and the caption is a quick summary of what we see on the image? Christian Ferrer ^(talk) 18:19, 24 April 2018 (UTC)

I suggest to change "Description : Describe what is notable about the file." by something more precise in the kind "Description : detail the description (or "write a detailled description") with what is notable about the file (subject, place, context, ect...). Christian Ferrer ^(talk) 18:33, 24 April 2018 (UTC)

detailled => detailed, ect => etc.

I think "notable" is too much of an insider's word. I don't have an exact wording, but something along the lines of "a caption that is likely to be useful with this file wherever it is reused."

I agree about "subject, place, context". - Jmabel ! talk 07:56, 25 April 2018 (UTC)

Of course, I'm saying this without seeing the UI myself. - Jmabel ! talk 07:58, 25 April 2018 (UTC)

Works well so far, but

Shouldn't the description be added in the description of the MediaInfo? Or that will be done at a later date? Regards, Yann (talk) 18:44, 24 April 2018 (UTC)

Technically it works... I understand the concept of the "caption" as label for the M-entry ... but... as photographer I really have no idea how to fill the caption field with meaningful text that is different from the filename. Furthermore it is a question of work to create for every file a meaningful caption. Raymond 20:08, 24 April 2018 (UTC)

First of all I would like to appologise of my poor English, but hope you will understand.
So thank you very much to allow me to test this! There were certain things, I had to learn. There are certain things, I dont like on the sollution, but lets say this may change in the future so lets focus on the caption problem. My thoughts are as follows:
- Wikidata interface will be confusing for WMC users and some may tend not to use it, because the wont learn it (maybe some simple videotutorial would help)
- Filling MediaInfo should be kind of automatic, this way it looks, you fill the description of the file and then you have to fill media info also, which takes more of your time. Some of us thought that Wikidata/Wikibase integration to WMC may do it other way, i.e. you add less information and software will fill more lines. So why not to create file description structured or semi structured and get some data to template:description from wikidata?
- I came into the conclusion that caption/label/Item name is not so important for Wikimedia commons. The problem is, that on WMC file naming is not standardized. So than in every language it may not be just translation, but completely different form, which does not help to neither party. I think there are two ways how to solve the problem. It probably depends, what use we expect from the whole integration and what can be done.
  1. Retrive automaticaly file name (including filetype e.g. jpeg) and use just caption description, translating it to different languages. Because, here it would be more usefull to add some statements, which describes the file (if possible) and than search images using these properties.
  2. Name the basic image depiction (red sofas, red sofa). Then we could have more same captions, which would differ by its description. But I am not in favour much of this sollution as I think the first is better. Because, what we expect on structured information on the files is, to describe them and than be able to filter that information. So it is not so much important the name of file (like it is important on Wikipedia or Wiktionary), but metadata and categories/description. Filename on Commons is just the technical thing, which comes from the fact, you cannot use more filenames for more images (like you can do on other media databases), so kind of system of filenaming was developed, but its not a broad standard for all subcomunities of WMC (here I refer to the different naming traditions for polish and czech WMC communities).
So would it be possible to create or use some wikidata properties for file description?
Finally, If you have about one hour, I have created 3 screencasting videos in English on YouTube, which shows my tests and my thoughts on the feature:
So I slept and refresh my mind:
- still not sure, what the captions are (label? label description? both?)
- Label for WMC files, does not make much sense. On wp pagename is very important, on wikt page name is even more important, on the other side on commons, its less important. On commons, filename makes pagename, but its not standardized and its creation is subjective. So I would propose not to use label at all, but due to the file nameing tradition, I would definitely propose to use filename. And again it is not so much important to have label in more languages. What is important to have a clear label description, which might be a shorten clear version of file description from a template and than (statements, which will provide structured information about the file). These could be:
  - file type (jpeg, pdf, etc.)
  - creation date
  - author
  - source
  - uploader
  - license
  - camera used, color or bw?, other technical metadata like those which are edit by templates or special categories
  - several statements describing the content (image take in=Hotel Thermal, on the image=read chairs ~ type - 1970 chairs, color - red, whatever....]--Juandev (talk) 06:42, 25 April 2018 (UTC)

At the Wikimedia Conference, some of us discussed the possibility that for people with a lot of wiki experience, it might be good if an alternate UI was also available that was simply a block of text with mark-up (a la wikitext), and where a back end would deserialize that and send the appropriate pieces to Wikidata. - Jmabel ! talk 08:02, 25 April 2018 (UTC)

Language selection bug

Hi all! Here is my feedback:

Bug when selecting the language, see the video. It only happens when I don't release the mouse button. I mean: Put mouse over "English", press mouse, move mouse to desired language, release mouse.
The descriptions are not visible in the MediaInfo page. See for instance https://federated-commons.wmflabs.org/wiki/MediaInfo:M295 , I entered descriptions in both English and French but they are not visible. I can reproduce the problem.
Let's say you enter descriptions in many languages, and you mistakenly let the left button to "English" for one of them, so there are two descriptions for "English". In this situation, you only get a difficult-to-understand error message at the very bottom of the page. To avoid this, how about changing the language selection button to only propose languages that are not being used yet? For instance if you already have an "English" description, do not propose "English" again. By the way I managed to upload an image with two English descriptions, here is the result.
With Caption, Description, and automatically-generated MID now available, the "Title" field should go away, but I guess that will be a subsequent step.
The whole thing is navigable with the keyboard, that's great!

Keep up the great work! :-) Syced (talk) 03:09, 27 April 2018 (UTC)

BTW, there is a New Upload Wizard on the test site? I havent seen it.--Juandev (talk) 05:05, 27 April 2018 (UTC)

To my knowledge there is not a new UploadWizard. Parts of it may look different as it is a testing ground, but it is the same software. Keegan (WMF) (talk) 18:49, 27 April 2018 (UTC)

Thanks. I uploaded two photographs. When I uploaded the first photograph, I only filled the captions in English and French. Once, the photo was uploaded, I saw the description field in French. The position of mediainfo field is lower than that I had accepted. On clicking the field, I was directed to Wikidata-like site. The French label was missing. So I manually entered this on the Wikidata-like site. Today I once again tried with another photograph. This time, I filled both the captions and descriptions in three languages: English, French and German. Once the upload was complete, I could see the description in all the three languages. However, on clicking the mediainfo link, I see only the labels on the Wikidata-like site, but not my descriptions. Overall, my experience was exactly the same when I upload photographs on Wikimedia commons. This is really great. However, I do not understand why only labels were filled on Wikidata-like site and not descriptions. Thanks and keep up the good work. John Samuel (talk) 17:36, 28 April 2018 (UTC)
I did a test upload (including changes) a few days ago. Looks good for me. Had no problems so far and the functionality is reasonable. Maybe some kind of auto detection or automatically import of license from the filedescrption etc. of existing files would be useful. --Steinsplitter (talk) 18:04, 1 May 2018 (UTC)
I did a test uploading an old screenshot I did for a paper about Wikidata. My feedback so far:
- "Caption" and "description" are somewhat misleading names: I thought "caption" would have been the label, and description, well, the description of the mediainfo page. Turned out I was wrong.
- Also, I'm used to put a full stop (".") at the end of a description, it'd nice to remember users not to do it again, or it will be imported in the labels/descriptions.
- Is the Commonsrepo going to directly take items from Wikidata? I'm trying to put a "depicts: Wikidata property" triple, but it's not working. (Not a real problem, but still...)
- Is the property suggester going to be available on Commonsrepo (I think it will, just asking)

Keep up the good work! :) --Sannita - not just another it.wiki sysop 12:54, 4 May 2018 (UTC)

The future of file names

Note: Commons has a long-standing policy on how to properly rename files, and a long-proposed policy on how to name files. I'm assuming in writing this that the policy may no longer be as relevant to the community once Commons is structured. Files will be much easier to locate regardless of the file name. If the community would like to continue its naming policy, or otherwise decline this proposal, that is an acceptable outcome.

It would be beneficial to Commons if file names were replaced with standardized names. The topic came up recently, and it's worth talking about as part of Structured Data on Commons. A new naming system could be put into place to resolve the many issues that exist because of the current way of doing things. What the new names would look like is unknown; it could be the SHA-1 hash all files already have, it could be a newly generated hash of some sort, or any other kind of numbering system. No matter the choice for the new file names, old file names could be kept. There are various ways to migrate and/or grandfather in file names and how they are handled on wikis, etc.

Replacing the current file names with standard file names would have the benefit of removing some problems with file names:

Removes the potentially complicated step of naming a file when uploading.
- Benefits individuals uploading a single or few images would have one less decision point in the process, a decision point that is potentially complex, which might otherwise prevent them from uploading.
- Benefits mass-uploading individuals and GLAM institutions often have to rename hundreds to thousands of files as part of their uploading process
Removes the step of having to rename files.
- Benefits the administrators and file movers that have to do the work of changing file names.
- Benefits those who spend time listing and discussing files for renaming, freeing up time for other things.
Removes complexity from code - Commons gadgets and tool developers along with MediaWiki developers would no longer have to consider all the edge cases that file names currently have. Two examples of situations that could have been prevented, there are many more:
- Part of the problem with Wikipedia Zero piracy was the ability for anyone to access some deleted files by accessing the file URL directly. This turned out to be caused by an inconsistency in how the image cache transcoded parenthesis () in file names. This would have been prevented.
- Sometimes the iOS application couldn't display some pages. It turns out that there were restrictions to the size of a URL in the application that broke because of long file names. This could have been prevented.

The issue at hand is first figuring out if Commons wants this to begin with, and then gathering consensus around how to do it. Replacing file names would require a global Request for Comment to be hosted here or on Meta. It would also require extensive work in gathering translations, making sure all the wikis are involved since they rely on Commons file names in editing, preparations to answer the extensive number of questions that can and will come up, the potential for misunderstandings to be managed, among other things. In other words, this will require a lot of work, with a high potential for rejection even if it is a good idea. But if it's accepted, it would be great for Commons.

What do you think about the idea? Is it important enough to take to the broader Commons and Wikimedia community? Keegan (WMF) (talk) 17:02, 5 April 2018 (UTC)

Why to have both a file name and a file ID, is it not more simple to have only an auto-generated file ID? Christian Ferrer ^(talk) 18:47, 5 April 2018 (UTC)
"Benefits those who spend time listing and discussing files for renaming" seems bogus to me. The same matters would presumably still be at issue, they just won't be in the form of filenames. - Jmabel ! talk 21:07, 5 April 2018 (UTC)
I think there still needs to be some reasonably mnemonic way to refer to an image. Little could be less mnemonic and more error-prone when typed by a human than an SHA-1 hash. - Jmabel ! talk 21:10, 5 April 2018 (UTC)
Won't this make things much more difficult for third-party search engines? - Jmabel ! talk 21:10, 5 April 2018 (UTC)

I could not care less but about third party search engines, but I care about the usage in the Wikiverse, and not having proper names for the pictures, let alone unfathomable cryptic gibberish, to include a good pic in a good article is, to put it mildly, sub-optimal. That's the main use-case the Wikiverse should care about, the other stuff is just nice to have. Grüße vom Sänger ♫ ^(talk) 21:25, 5 April 2018 (UTC)

Absolutely no way in hell. Good filenames make a huge difference to the information available in a category view.

Consider for example, Category:Images released by British Library Images Online, March 2014‎, where the filenames immediately give what the item is, where it came from, when it was created, where in the British Library's holdings it comes from. Or the work that has gone into curating the filenames in a huge number of categories like Category:Collection de Costumes Suisses (1822) by REINHARDT -- so that at a glance, one can pick up exactly what the image is, where it has come from, and when it was made. I'm currently preparing a big upload of maps from 19th-century books, see eg: Commons:British Library/MC maps batch 06 (GB towns and cities). Something I see as a key part of the process is to try to make sure the files are going to have meaningful names. As User:Sänger notes above, this is immensely valuable when using the images, and editing them in wiki pages, as well eg for users who download all the files in a category. But beyond that, it is fundamental to how we present files here in categories.

As for the structured data project, it is an interesting experiment. But it is vapourware. The challenges it faces are enormous, and it may never work. There is not even a proof of concept of the search, not even the slightest back-of-an-envelope sketch yet of plausible achievability -- eg how to return "Picture of a man in a hat", when the man won't be tagged as 'man' (or even Q5) and the hat won't be tagged as 'hat'; when even to produce a list of Q5s that are male currently times out for a single-user query on WDQS, before we even start to think of joining it with anything else, never mind how to scale up to a system that has to be ready for mass-usage, and produce results that are almost instant. And that's just the search. Schemes for populating the system with detailed descriptive data for 40 million files simply don't exist either -- it's pure vapour. (And something that a number of smaller, simpler schemes like ArtUK have notably failed to crowdsource to any consistency of coverage). So: no, don't expect us to even think of doing anything that might sabotage a key plank of Commons, until Structured Data is an absolutely solid proven reality, that is proven to work in a full-scale production environment, with fully loaded fully detailed data, under full real-world demand load.

Rather than this half-baked crap, as Commons community liasons why not focus on how to get the project to achieve something that the community has asked for, and repeatedly: namely CommonsData items for Commons categories? This should be a quick proof of concept and quick win for the federated CommonsData system, showing that it can run at production scale, that would let us the community immediately get on with some useful work, namely identifying what the categories represent, and recording it in accessible, queryable form using structured data, which isn't currently possible, since most Commons categories don't (and won't) have Wikidata items. Jheald (talk) 22:49, 5 April 2018 (UTC)

@Keegan (WMF): April’s Fool was last week. -- Tuválkin ✉ ✇ 23:21, 5 April 2018 (UTC)
:) Keegan (WMF) (talk) 22:28, 9 April 2018 (UTC)
The idea of getting rid of user-specified filenames gives a lot more motivation to the captions aspect of Multilingual Captions. The captions would replace filenames, and would have the advantage of being translatable. I don't think it's a bad idea. You'd have a persistent ID for a file which could be used for all external links, and won't break just because somebody wants to change the caption / title. I think abandoning the idea of "uploading new versions" of a file would also be part of it: overwriting files has always been problematic. Maybe you could instead have some kind of "clone file with new version" feature that would save filling out all the details. I'd have thought the database would already have an identifier for each file, maybe a sequence number or something, which could be used as the "file name". --ghouston (talk) 00:01, 6 April 2018 (UTC)
- FWIW, I routinely use "uploading new versions" as a way to first upload the photo as it came from my camera, then upload a post-processed version. Means that no one will "too easily" use my rawer version in a Wikipedia article, but it's there if someone wants it for a different post-process. I hope that if we get rid of "uploading new versions," there will be another way to support this use case. Maybe some notion of a "draft only" version?
  - Or a way to indicate that a different file is the preferred version: it would also apply to files with errors. --ghouston (talk) 00:36, 6 April 2018 (UTC)
- Also, we'd need to think about some other way to reference charts or graphs that deliberately change over time. Maybe via some sort of redirect, where we would change the target? - Jmabel ! talk 00:21, 6 April 2018 (UTC)
  - Yeah, I was just thinking about that problem, some sort of "virtual file" which redirects would seem to be needed. There's also the problem of files which have been widely linked throughout Wikimedia projects but turn out to have an error that should be fixed. Maybe that could be handled with a "global replace" tool available to administrators / and perhaps unemployed file movers. --ghouston (talk) 00:27, 6 April 2018 (UTC)
An integer sequence number, like the item identifiers on Wikidata, would probably look better for linking than a 20-byte SHA-1 hash. --ghouston (talk) 01:00, 6 April 2018 (UTC)
- Almost as miserable as an SHA-1 hash for something that has to be handled by a human. One of the many reasons it's a problem: a 1-character typing mistake will almost always result in something still meaningful, but not what you intended. At the very least, if we are going to use non-mnemonic names, there should be some sort of checksum to make them single-error tolerant. - Jmabel ! talk 01:33, 6 April 2018 (UTC)
  - An 8 or 9 digit number would be shorter than most filenames (I guess), and much shorter than some of them. Hopefully people would notice if they copied the wrong number and got the wrong file. --ghouston (talk) 02:45, 6 April 2018 (UTC)
  - Just looking in the recent changes, I spotted File:Joseph McKenna, Associate Justice, Supreme Court, full-length portrait, seated, facing right LCCN97502836.tif: file names already have cryptic components that can be longer than 8-9 characters. --ghouston (talk) 02:48, 6 April 2018 (UTC)
    - Right, a name like that is clumsy, but it's also highly redundant: it is very unlikely that if you change, add, or remove a single character you will get another meaningful (and unrelated!) filename. - Jmabel ! talk 05:38, 6 April 2018 (UTC)

We have the full file name ("breadcrumb") and a short file name (pages, media files, etc.). In the short file name, we, of course, can not include all the information. But simply to get it based on the full file name. By the symbols used in the short file name: the same evolution will occur here as with the names (from symbols) of Wikipedia articles (and also according to the law of the transition of quantity to quality), when instead of page names in a variety of different languages, the human-readable Wikipedia will be named in one universal, unified (WD?) language. Ie, instead of names on enWP, or on frWP, or deWP, or beWP, or etc., one unique name Q* on WD. Has WD problems with translating WD-language into any user's language? No, it does not: everyone gets the name of the (URL) object in their own language. And there are no problems with renaming (only one question is solved for the item: to beexist or not to beexist). --Fractaler (talk) 07:35, 6 April 2018 (UTC)

Several points why this is a bad idea:

If I see an edit in an article about John Doe changing File:John_Doe_on_public_meeting_in_Whereverville.jpg to File:Erected_penis.jpg I know this is a vandalism immediately, without even the need to wait until it loads. If I see File:5FD4F1E4353745E3A63592A0637EADBE6787A4E4EB0CA77E7A0818878F366B81.jpg changed to File:97FB251BA0783BFA668E6496BB4D8B69F63B1E264692BA8134168113D48F3BDF.jpg I have no idea what has happened. This involves not only apparent vandalism (let's assume some properties on Structured Commons might give some tips in lieu of current ones in the future), file names often give a cue that the file comes from a professional studio or is taken from the official web-site, so we can spot possible copyvio. If someone adds to the Kyiv article File:Independence_square_Kyiv.jpg I know this is a noFoP violation I should go RfD. If I see File:028BAA2538B3CFDE59E71C918400E8D3CFA31F222EDA051F4F2980B4F2B21DFD.jpg added I have no idea what it is and have to check. Also I would not be able to easily locate the file on page because I can hover my mouse over the files and see which one is Independence_square_Kyiv.jpg, but I would not easily notice which one was File:028BAA2538B3CFDE59E71C918400E8D3CFA31F222EDA051F4F2980B4F2B21DFD.jpg and which one is File:5DC3ACC0C06D462F8880C83A7A1A7B017BD66BA596C4AF53AD0299C4AB9A152C.jpg.
Reuse in printed media. If I use a file in a printed leaflet, or a book or anything else I provide attribution. A reader then can follow that https://commons.wikimedia.org/wiki/File:Вася_Пупкин.jpg and see the file. Of course for a reader not knowing Cyrillic this would be about as tricky as for me would be https://commons.wikimedia.org/wiki/File:熊貓在樹上.jpg arguably https://commons.wikimedia.org/wiki/File:AD85993F7BD2323FADBBD36BD2EDE03F644082DF39B26A5626062E095F817410.jpg would actually be simpler to type, but generally common sense applies and as a person involved in creation of the printed material I can gauge how likely my audience is to know how to type in particular alphabet and if they are not I can use percent encoding or link by curid instead. Filenames are supposed to be self-explanatory so this also reduces the need to provide an additional text which is sometimes crucial (e.g. logo attribution on a small leaflet). I understand that we still would have queryable file descriptions or something like that, but those might have a legal implication: filenames are unique, so the attribution is satisfied with it. If I end up getting 20 files of panda on a tree by the descriptor printed then it is no proper attribution.
UX when editing articles. In VE and on WD we can show description of the image alright. But source editing is not going to go anywhere and I would really rather edit a table of let's say this while seeing the names, so that I know that they correspond to correct items rather than see a list of non-human readable identifiers. And as mentioned this will also remove possibility to check which file it is by hovering and checking statusbar. This is different than CVN-ish first issue because the first one is about metapedian work of patrolling, checking whatchlisted edits and so forth, while this one is about general feeling when editing articles (and let's remember people enjoying writing articles must be our main priority).

I think the only improvement would be to allow to use files externally by their desc page id rather than name externally, just like they can be linked to by it. This would allow embedding them in third party sites the way that ensures that our file renames do not affect them and also possibly bypassing some limitations as some platforms do not allow too long URLs as might be the case with some files and so forth. P.S. I realise that if rather than SHA256 as in my examples something else is used just like WD identifiers or curids some stuff would feel less bad, but still I'd rather not look at 68088814 instead of Schloss Herrenchiemsee LOC ppmsca.52570 unless I have too. --Base (talk) 17:29, 6 April 2018 (UTC)

Is linking to images on Flickr difficult, given that they use numeric ids? It doesn't seem that way to me, if anything it's easier than arbitrary Unicode filenames. Although on Flickr, it seems you need to combine the user and image ids in the URL. --ghouston (talk) 23:45, 6 April 2018 (UTC)

@Keegan (WMF): Please don't. Structured data people promised to the community to keep the normal functions (Category's, etc.). Needless to say that the file rename policy has been approved by the community and changing it requires a formal RFC. :) I know that filenames and x-wiki transclusion is problematic, but we have to find a other solution to keep filenames. --Steinsplitter (talk) 17:36, 6 April 2018 (UTC)

Analogs will say more about the benefits for the end user (ie, for those who use Commons, rather than editing it) the conversion of a variable ("name", "імя", "নাম", "नाउँ", "Όνομασία", "नाम", "名前", "სახელი", "名称", etc) to a constant (something like "Q82799" or "5FD4F1E4353745E3A63592A0637EADBE6787A4E4EB0CA77E7A0818878F366B82"). Analogue 1: DNS-name/address (shortcut, variable) and IP-name/address (constant). IP will be permanent, and DNS can be changed without affecting the IP references.--Fractaler (talk) 17:38, 7 April 2018 (UTC)

Thanks for the feedback, folks. It looks like this won't be a good idea to move forward with. People are welcome to continue the discussion, I'll keep an eye on it, but I'll pretty much consider the matter closed. Keegan (WMF) (talk) 22:28, 9 April 2018 (UTC)

Autogenerated summaries of files in the user's language could be useful, especially where the file name is in a language the user does not understand (or whose writing system they cannot even decipher in any way). For example, this file could get the autogenerated string lighthouse + solar panel, Ireland (22 September 2014, Lydia Pintscher) in English.

Such summaries would not be proper file names, they would not have to be unique and could change over time, but they could conceivably replace the file name in some situations where information is displayed to the user. For example in an enhanced view comparing different versions of a page (where users could set autogenerated summaries as the default for what is shown to them as the file identity) or in long lists of files. --Njardarlogar (talk) 08:59, 5 May 2018 (UTC)

Self parent cats due to wikidata

The Birth of Venus (Botticelli) and all files within Details of The Birth of Venus (Botticelli) stating Wikidata qnumber automatically get part of The Birth of Venus (Botticelli), same with Christ on the Cross adored by two Donor (El Greco), Adorazione dei Magi by Gentile da Fabriano. Please stop it and repair.--Oursana (talk) 22:49, 1 May 2018 (UTC)

Pinging @Jarekt: looks like a problem with {{Artwork}}? --El Grafo (talk) 09:24, 2 May 2018 (UTC)

El Grafo and Oursana that issue should be fixed now. --Jarekt (talk) 17:43, 2 May 2018 (UTC)

@Jarekt: , thank you, almost fixed. Please look into cat The Birth of Venus (Botticelli). All detail-files (which have all wd) have still automatically/magic the super cat, which they should not. --Oursana (talk) 21:40, 2 May 2018 (UTC)

I was trying to fix an issue where I occasionally see that when someone made a category for an individual artwork they did not moved all the images of that artwork into the category. I was imagined that it would be very rare to have sub-categories of individual artwork categories, so it would be safe to auto-categorize. But maybe it was a bad idea. --Jarekt (talk) 00:41, 3 May 2018 (UTC)

Same problem with Mérode Altarpiece, new cat:Mérode Altarpiece by the workshop of Robert Campin, getting a mess--Oursana (talk) 06:00, 8 May 2018 (UTC)

I do not see reason to auto categorize--Oursana (talk) 06:03, 8 May 2018 (UTC)

Oursana, I just turned it off. I was trying to remove it at the same time I roll out other changes but kept forgetting. --Jarekt (talk) 01:38, 9 May 2018 (UTC)

Jarekt Sorry still a mass, files of subcat are also in supercat and do not even show. see all files within Details of The Birth of Venus (Botticelli) stating Wikidata qnumber automatically get part of The Birth of Venus (Botticelli) and others--Oursana (talk) 01:57, 9 May 2018 (UTC)

Another example why structured data are needed for every-day commons user

Hi. Since I am here "frustrated" working with another commons user at an edit-a-thon, User:LigaDue, I'd like to share with you another example of why we really need structured metadata on commons soon or later. See for example here. We have no idea of the right standard to use for the title of a new category. The same happens for generic building exteriors. Is the right title "Exterior of...", "- outside" or "-exterior"? Of course, we can keep looking, compare different countries but it's such a wasteful process. We spend too much time trying to answer these questions when we clean the files. It would be nice if we could link to the wikidata item (or something similar) of these concepts, in these cases it would be so much faster and less ambiguous than making a statistics of usual strings in titles. Maybe some people know these things but if you need some real life examples to show to those who might think the actual structure of commons is fair to manage, well, that is one. Bye--Alexmar983 (talk) 09:50, 2 June 2018 (UTC)

No, that’s an example on how terminological standartization is important for a smooth and transparent workflow. You give no explanation on how magicly structured data will not suffer from the same growing pains as category names did/do. -- Tuválkin ✉ ✇ 16:21, 2 June 2018 (UTC)
+1 to Tuvalkin here. I'm all for structured data, but this is not a problem it would help solve. - Jmabel ! talk 17:14, 2 June 2018 (UTC)
- Humm.. Not sure I've understood how structured data would work on Commons. But if it's a system where you say "exterior" (AKA "outside" AKA whatever, all defined in the same term) + "church" + "Italy", I believe it can indeed facilitate the process.-- Darwin ^Ahoy! 17:19, 2 June 2018 (UTC)

You describe with a clear unambiguous information that this image is 1) a building/church/palace 2) an exterior/an interior. At this point you know this is an intersection of the concepts "exterior" and "building". You can make a query with this information, of course and this basically eliminates the need of a categorization system to look for files but if you still need new categories for other reason (which is true), you can make them quickly and change the way the string of the title is called with a bot every time you want. Every time there are enough images with these intersections, the bot can create the category and of course it does with the standardized title we agree, even with basic standardized descriptions and standardize navboxes and so on. Everytime a certain combination becomes common, you can expand with a click the categorization tree. If I have to spend my afternoon getting frustrated creating manual categories that I have no idea I am doing right, and someone else have to overwrite probably what I am doing again and again, than I prefer to spend it converting previous categories and description information in metadata or revising similar information suggested by bot. It's information management in any case but with metadata the value of my effort is much bigger, so is its flexibility, and I would like my effort to be more fruitful. The manual categorization can always increases in confusion, the metadata architecture basically increases in sophistication. For many of us who work with both commons and wikidata, the need to handle commons files the way we handle wikidata item is something we start to feel. It took us two or three years to get a solid and robust metadata architecture on wikidata but at least when we do a query to search for something, it kinda works quite well, and it's improving. Our manual categorization system here is not so efficient, and we feel the frustration that we are not progressing to something that works better but simply adding partial and not always coherent patch here and there. ---Alexmar983 (talk) 18:05, 2 June 2018 (UTC)

And then imagine trying to do this kind of upload, when your first language is something other than English. Even if we standardize the vocabulary per @Tuvalkin: , the grammatical intersection of multiple concepts could be overwhelming. Sadads (talk) 12:21, 5 June 2018 (UTC)

No need to imagine: Enlish is not my native language and I do a lot of categorization in Commons. It works. Your vapourware does not. So, go ahead and keep wasting your time and talent and WMF donations money with it, but keep your hands off Commons categories. (By the way: Category:Pedro Mexia was just created; everything works, except creating links to pt:Pedro Mexia via Wikidata because reasons.) -- Tuválkin ✉ ✇ 20:04, 5 June 2018 (UTC)

Just had a look at Category:Pedro Mexia, and the link to pt:Pedro Mexia is there just fine. Jean-Fred (talk) 06:37, 6 June 2018 (UTC) P

It takes time to transclude. Yet when I tried to manually creating the recyprocal link, I was faced with a gobbledegook error message, which is not what one should get from a UI when trying to do something thats already done. -- Tuválkin ✉ ✇ 11:32, 6 June 2018 (UTC)

Sadads, Sadads, Tuválkin categories are... in English. How is it possible that people are worried about metadata in English and not categories? it's more complicated to find the names of categories in English (or miscategorized files with description in other languages) than an integrated systems with wikidata that can provide labels in different languages automatically, which is what wikidata already does. It's not even a problem, actually metadata increases multilingual flexibility because they standardize the handling of key concepts. It will remains a vaporware not because a lack of tools to do it but resistance. Like keep your hands off Commons categories... it's weird because even if you keep manual categories as much as it pleases you, metadata can be used in parallel and basically they can be used to make categories in a much more efficient way. It's written above in my example, it's not a complicated automation, you only need to invest in metadata for files, which people like me are welcome to do instead of battling with these strings and category trees. Grammatical intersection of multiple concepts could be overwhelming? yes, like with categories. You don't see it a lot in their cases because they are done manually. But that's not a balanced solution! Metadata automation does not automatically increases details, it simply forces you to clearly define the detail levels which are ok. You can have them on demand in a personal query (which is good) or decide which level of categorization to provide. And that's a good thing, a responsible thing. Currently, you simply have excessive details here and there in any case, mostly hindered in a bunch of categroization holes. This is a poor scenario, because it literally means you have no idea what to expect from the category tree.

Of course, if we started years ago it would have taken a lower effort to adapt metadata, but it's never to late to see things in a functional perspective instead of projecting fears. I mean, I care about money too, precisely I care about this huge amount of time wasted that it is also indirectly money. A lot of money. The metadata investment is already years late and we are far beyond the key steps of literacy, as far as I can see. Not literacy of newbies, in my experience. New users see the metadata quickly, they learn wikidata (and metadata) quite fast, when they arrive to the mediocre, not flexible for multilingualism, not complete, uneven and time-consuming architecture of commons categories I can simply link to them these discussions and explain how commons is currently "protected" by this scenario. So, yes, I can show them how to make a query to list with pinpointed precision what they need amongst millions of wikidata items but not millions of files. In the end, more years of metadata illiteracy will simply leave to new generation of users a much expensive bill. Well, not my fault.--Alexmar983 (talk) 03:22, 12 June 2018 (UTC)

This is what theory looks like and what I expected from Wikidata when it appeared a few years back. I immediatly thought — yay, we can have language-independent categorization! (Yes, because unlike what you slyly imply, I’m very much not an Anglocentric monolingual, as my user page hopefully suggests.) But no. This is what Wikidata has been so far: Underwhelming in meeting its originally percieved goals and at the same time threatning to take over systematized data from other projects (like geolocation from Commons, infobox data from Wikipedia(s), the chilling annoucement of lexical data to endanger Wikitionnary etc.), locking it in a dumbed-down, gamified UI that cannot sustain the kind of workflow “power users” are accustumed to, effectivly shutting down the mechanism that allowed Commons (and the other projects) the very build-up of entered data.

But go ahead, maybe it will become a beautiful thing. Just don’t destroy others’ way of contributing, okay? Feel free to diss categories, it’s amusing when you do it, but refrain from pushing to its removal from Commons.

-- Tuválkin ✉ ✇ 11:11, 12 June 2018 (UTC)

Tuválkin refrain from pushing to its removal from Common Just don’t destroy others’ way of contributing Who said that? I did not. Are you talking to me or not? You can go on creating manual category as much as it pleases you, as far I care, it's your time... but there are many users that don't get why they still MUST do it this way. And trust me when you enter not simply a remote hamlet in a European country but entire areas of the rest of the world, just to stick to geography, you feel strong that this ecosystem is not sustainable. Certainly, it cannot last if you keep pushing millions of files from other platforms by bot. We have bots for that, but not for categorization based on a semantic architecture because we keep delaying it in every way (like this reaction) and that's apparently a good thing.

In the future, can I add "interior" "church" "name of place" using a multilingual menu and let a bot make a category while you handle your manual category please?

In my experience, I expected from Wikidata and its structure something, it was not there in the beginning, now it is there or it starts to be there. My time was well spent. I am talking based on years of interactions, with other users. I could write a lot about experience, but I just repeat: I'd like to spend my afternoon investing in a metadata architecture for the future than this category tree. Not just me. And I would like, when I share this experience, not to see a bunch of users act this way. Read what I have written, don't reply to what you think it's written. It's not constructive. Not just for me but for the people I show these pages later. I don't know if you have noticed, but it seems as if I did a reply to the presented concerns showing they are also the same defects of the current system that you consider the best option or alternative (but again, you can keep that manual system, just don't force us all to use only it forever). People I show this page might notice that, including the fact that so far the reply were also a little but vaporware themselves.

A vaporware of fears, but that's what it is. Are you really worried that the manual categories disappear? because I am not, I want to make manual categories about very sophisticated topic myself but not about "war memorials in the province of X". It's 2018, please... there are much more delicate topics I should invest my time.--Alexmar983 (talk) 13:40, 19 June 2018 (UTC)

New feedback request - Depicts

There is a new feedback request available, the draft specifications for Depicts statements. Please have a look over the presentation and leave your thoughts on the talk page. Keegan (WMF) (talk) 18:11, 16 August 2018 (UTC)

Properties table

I've created a master table for holding Wikidata properties related to Commons: Commons:Structured_data/Properties_table.

Please leave your comments on the talk page, and help fill in missing property numbers if you are familiar with something that exists and could be listed. Keegan (WMF) (talk) 18:00, 29 August 2018 (UTC)

Timeline and GLAM upload

In Wikimedia Denmark within the Wiki Labs Kultur meetings, we discuss the possibility of mass upload of media files to Wikimedia Commons. I am (perhaps) of the opinion that it would be a good idea to wait until SDC is implemented. Am an wondering if there is anyone that can say something about the timeline of SDC and whether it is worth waiting until SDC is implemented? — Fnielsen (talk) 13:21, 24 August 2018 (UTC)

I don't think this matters much. On the other hand, it would be great if you could express what your original intention was, i.e. "mass upload of PD media files from defunct national TV broadcasting service", and your reason for waiting, i.e. "waiting for properties to express minute-bookmarks to enable media citation in Wikipedia" or something like that. Because in the end, nothing will change as far as file storage on Commons goes, and SDoC will just give us more tools to make existing media more findable in the wikiverse. Jane023 (talk) 07:23, 26 August 2018 (UTC)

@Fnielsen: You can do a lot to get your upload ready for structured data right now, by trying to match the people, places, objects, events etc related to your images to items on Wikidata (or creating new items on Wikidata if existing items don't exist but should). This is the work (and it may be a *lot* of work) that you would have to do anyway, for structured data to be of any value. Even without structured data, identifying the relevant Wikidata items can help identify the right names of categories that your images should be in; and if in the current description templates you include links to Wikidata items, that will make it easier when the time comes that that information can be moved to structured data. Jheald (talk) 12:30, 26 August 2018 (UTC)

@Fnielsen: I do not think you should wait. Structured data is going to be implemented in pieces and not all at once, with the first feature release coming in October (Multilingual Captions, more on that soon). And more specifically, Multilingual Captions will not be supporting batch uploads at launch, I believe. The next feature set, depicts, is coming the early part of next year. Things after that like licensing and attribution come later in 2019. Go ahead and run your campaign, I wish you much success! Keegan (WMF) (talk) 23:14, 28 August 2018 (UTC)

@Fnielsen: Apologies for my late reply! Jane, Jheald and Keegan have given you excellent input.

In terms of timeline: Structured Commons features will be gradually deployed. Expect the first really 'useful' features for GLAM projects ('depicts') in early 2019. Batch upload tools like Pattypan will probably not be ready for structured data immediately.
If your projects have planning constraints themselves (e.g. the uploads must be finished by, say, April 2019, for instance): by all means, go ahead and do the uploads in wikitext and templates. As Jheald says: in order to make your metadata easily translatable to structured data: make sure all relevant and notable people and organisations (creators, institutions), depicted artworks, events, places, buildings... are also available and well-described on Wikidata.
If you are flexible in terms of planning, and are excited to try the new technology, and not afraid of a bit of experimentation: yes, you can also wait! Spring 2019 is when I expect the first larger GLAM uploads to be feasible. It might be a bit of an adventure because so much of the workflows and tools will change.

Feel free to get in touch with me directly if you'd like more specific feedback. Warmly, SandraF (WMF) (talk) 11:27, 4 September 2018 (UTC)

Mockups for structured copyright and licensing statements

Mockups of structured licensing and copyright statements on file pages are posted. Please have a look over the examples and leave your feedback on the talk page. Keegan (WMF) (talk) 15:24, 6 September 2018 (UTC)

Property creation on Wikidata

Hello everyone! Over the past few months, we brainstormed about Wikidata properties that will be needed to describe files on Wikimedia Commons, and those ideas have been summarized with a list of properties. Updates and feedback and further thoughts are still very welcome.

Some of these properties currently exist on Wikidata, but many do not and are in need of creation. Property creation on Wikidata is a community-driven process, the development team will be happy to follow along and to support where possible. As Depicts and other statements will be deployed in the first months of 2019, it is time to start process of creating new properties now.

Here are some first thoughts from the team.

Before the deployment of lexemes on Wikidata, there was a flurry of property creation specifically for that purpose. There is now a dedicated section for Lexeme-related property proposals on Wikidata, and Lexeme-specific properties are instance of Wikidata property for lexemes (instance of (P31) Wikidata property for lexemes (Q54254515)). What do you think about creating a specific section for properties related to Commons?
A current property proposal on Wikidata, relevant for Commons, related to copyright is still pending with an uncertain outcome; perhaps more engagement from the Commons community is needed here? Quite a few active Commons editors are also very active on Wikidata, and it might be beneficial if people who are active on both projects are willing and able to pay attention to the property creation process in general.

Please let us know how you can help or what you think! Keegan (WMF) (talk) 16:37, 19 September 2018 (UTC)

I do think we need a separate section dedicated to Commons related properties, which might or might not be useful on Wikidata. Future copyright related properties should be discussed in that context. We actually have d:Wikidata:Property_proposal/Sister_projects#Wikimedia_Commons, may be that is the right place. --Jarekt (talk) 17:18, 19 September 2018 (UTC)

I suggest moving this discussion list of properties, to Wikidata by creating a page similar to d:Wikidata:Lexicographical data. We have several projects on Wikidata like d:Wikidata:WikiProject Informatics/Software/Properties created to get feedback from the Wikidata community members for deciding the use of existing properties or for the proposition of new properties. I however agree that this process may sometimes be very slow. Also the links pointed to by User talk:Jarekt are equally important. John Samuel (talk) 17:30, 20 September 2018 (UTC)

I'm happy to move/copy the table over to Wikidata. Would someone like to set up a page for it to live on, that has Wikidata-relevant project information? I'm not from the Wikidata community myself, so I'm unfamiliar with how that should go. Keegan (WMF) (talk) 16:51, 21 September 2018 (UTC)

+1 on @Jarekt: , I think it'd be a good idea to make a separate page for SD. Should we also create a separate page here? @Jsamwrites: We already have d:Wikidata:WikiProject Commons, but I think we can improve it. --Sannita - not just another it.wiki sysop 19:43, 21 September 2018 (UTC)

@Sannita: , Thanks. I added myself as a participant. @Keegan (WMF): My personal opinion is that we can create a subpage on d:Wikidata:WikiProject Commons or copy the current discussion to the new page. John Samuel (talk) 10:50, 22 September 2018 (UTC)

Searching Commons - how to structure coverage

RIsler (WMF), the Structured Data product manager, has identified an issue that he'd like to bring to the community's attention, with regards to how search will function:

After review with many engineering and product folks at WMF, WMDE, and within the Commons community, we've come to understand that the initial implementation of depicts "tags" for Commons media should be more focused on making sure all relevant concepts are identified and tagged, rather than limiting tagging to a few specific terms. Additionally, for now we won't rely much on the Wikidata ontology (the data structured) to find any additional depicts statements automatically.
Here is an example of what we mean. Let's try the hypothetical case of an image of a German Shepherd, and the user uploading it tagged it with only "German Shepherd" (Wikidata item d:Q38280):

We may be able to suggest an additional depicts tag (dog, d:Q144) based on the "subclass of" or "instance of" property of German Shepherd (we are still determining if this is possible for the initial version of depicts functionality). These suggested tags could appear during use of the UploadWizard, or on the image's file page, and be available for a human to confirm their accuracy before being added to the file's data.

In the first half of 2019, we expect to launch a machine-based image classification feature that may suggest a number of additional depicts tags including "dog" (d:Q144), "pet" (d:Q39201), "canine" (d:Q2474088), etc. These suggested tags could appear on the image's file page and be available for a human to confirm their accuracy before being added to the file's data.

Once a suggested tag is confirmed, it is added as a depict statement and the German Shepherd image will show up as a match for searches for any of those terms.

On the file page, users will be free to add additional depicts tags that are accurate for the image (for instance, if it's a young dog, add puppy (d:Q39266) )

This combination of techniques should ultimately result in better searches that can be both very specific (show me German Shepherd puppies) and broad (show me pets).

Within the next week or two we will provide information on access to try out a prototype for Search on Commons. The prototype will not be advanced enough to show what we are talking about here, but we will be providing more information about "good coverage" tagging at that time. Keegan (WMF) (talk) 17:04, 21 September 2018 (UTC)

@Keegan (WMF):

1/What do you mean "the user uploading it tagged it with only "German Shepherd" "? during use of the UploadWizard will the user have to chose a tag or a category? or did you want to mean "the user uploading it with only "German Shepherd" as category? Christian Ferrer ^(talk) 14:28, 22 September 2018 (UTC)

2/may be a language misunderstanding from myself, but do you mean that an image " only tagged with "German Shepherd" " will not appear in "dog" search results, because it is not tagged as "dog"? and that we have to add the "dog" tag manually? Christian Ferrer ^(talk) 14:37, 22 September 2018 (UTC)

@Keegan (WMF):

I can appreciate why you're considering this, but (as presented) I think it's a bad idea.

A key principle on both Wikidata and Commons has been to try to make statements as narrow and precise as possible, and to rely on hierarchy rather than permitting redundancy (eg: COM:OVERCAT, here on Commons).

The problem, as many have discovered, is that searching a hierarchy is expensive, far more expensive than a flat tag search. People writing bespoke queries may be prepared to wait 60 seconds for a full hierarchical exploration (and the SPARQL service is able support this relatively small population of searchers). But 60 seconds is not acceptable for the main search interface, nor would the query engine be likely to scale to support full hierarchical searching for the entire population of searchers.

Also there's the issue that the Wikidata ontology at the moment is simply not in good enough shape -- just not consistent and predictable or reliable enough -- to even specify what those hierarchical searches should be.

So going back towards something that can be implemented as a flat search starts to look like the only solution.

But IMO adding multiple redundant "depicts" tags for the same object in a wider image is to be avoided if at all possible. Keegan, you say that there has been a review of this "within the Commons community". I'm aware of a couple of times the question has been raised, eg here and here, admittedly without much take-up, but with a sense I think that this was not the direction the participants would prefer. It adds redundant clutter to the item. It makes it difficult to know whether there are two objects involved, or just a single one. It reduces the impetus to refine the description and try to describe the things really sharply (in my view the COM:OVERCAT principle strongly contributes to the activity of category refinement for images). It makes it less clear where qualifiers (like "shown with features" or "located within image") should be placed. And it goes directly against the principle used on Wikidata, on a system that's supposed to seamlessly combine with it.

As an alternative, I would suggest treating these additional tags added for search purposes as 'shadow tags', attached closely to specific (conventional) primary tags for items. So if something in the image is tagged "German shepherd", make "dog" an alternate shadow tag attached specifically to that "German shepherd" tag, rather than a free-floating tag in its own right.

That way we can keep things organised, preserve the impetus to try to refine the identification of things, and be clear about how many identified things there are -- that there is only one animal in question, not two. Jheald (talk) 20:32, 22 September 2018 (UTC)

A further issue is what will happen when a Commons image "depicts" something with its own Wikidata item. How is it proposed to handle this case? An item on Wikidata will not have redundant depicts values: it will not have an additional "depicts:dog" statement, if it is for a painting of a German shepherd. Jheald (talk)

The "shadow tags" would be a kind of cache and like any cache would easily become out of date if the underlying data is changed on Wikidata. But the alternatives don't seem very pleasant. Queries that take 30 seconds to complete? Tagging every photo of a human with "human", "homo sapiens", "person", "homo", "homininae", "hominidae", "primate", "ape", "animal", "onmivore" "two-legged animal", "organism", "thing",... I know I've missed a lot. --ghouston (talk) 11:12, 23 September 2018 (UTC)

@Ghouston: The team appear to have developed really cold feet about using Wikidata to populate the additional search tags -- see phab:T199119, and in particular the first substantive comment on that ticket, by Cparle on 10 July, for their quick initial read of some of the issues this would face. So I don't think there would be any intention to keep the additional tags sync'd with Wikidata. Instead I think the suggestion is to perhaps try to suggest a few extra tags at upload time, and then from then on let them follow their own separate destiny. (Otherwise your analogy with a cache would be spot on.)

Hence the 'shadow tags' existing in their own right. But I do think there might be mileage in e.g. storing them as a qualifier ("additional search tag" ?) to a depicts statement, rather than as independent depicts statements in their own right. Jheald (talk) 17:13, 23 September 2018 (UTC)

Jheald has accurately described some of the technical issues that prevent us from implementing the preferred approach. The idea of something like an "additional search term" qualifier has some promise, and is an approach we're still considering as a possibility, but we need to game out the consequences involved. There are other logistical issues like how we would display it consistently in the UI, and how we integrate that approach with other platforms/systems (like GLAM databases), and how this would work with search. If that approach turns out to not be feasible, the solution that covers all requirements without extreme workarounds is to simply have a number of depicts tags on the M item. Although some tags might be somewhat redundant to humans (but still useful for search purposes), we can probably mitigate the impact on the UI. We will have the "Make Primary" button/link that will allow users to essentially say "these things are the most important", and those tags would be shown first and be the preferred vehicles for qualifiers. Again, using the German Shepherd example, although the image may be tagged with "dog", "pet", etc., German Shepherd can be the primary tag and house the important qualifiers like "applies to part", "shown with features", etc. while the depicts tag "dog" doesn't need to be primary and can just hang out in the background minding its own business (we're also considering a "cutoff" where, after a certain number of depicts tags, the user will have to expand to see more). We also have other reasons for wanting to separate what we're calling "elemental" depicts tags, including making it easier to import data from sources that already have tags set up that way (like Flickr Commons, GLAM sites, etc). Depicts on Commons will perhaps be the most complex part of the project, and easy answers will be in short supply, but we think the end result will be a dramatic improvement in search and discoverability. RIsler (WMF) (talk) 22:35, 24 September 2018 (UTC)

@RIsler (WMF): Thanks for dropping by. It's good to know that something like an "additional search term" qualifier is still in consideration.

Regarding the use of "Make Primary", I am now a bit confused. I had understood, from the Depicts consultation that 'Primary' was to be used on "depicts" to indicate the overall topic of the image -- eg something like nativity scene (Q31732) or sacra conversazione (Q370665), rather than being used to prefer Virgin Mary (Q345) over woman (Q467) for one of the elements within the scene. I do think that for the latter a better approach would be to try to tie the two together more concretely, eg by making the one a qualifier value for the other. It would be a much better structure for people writing queries to be able to work out what is going on. The idea of introducing additional ranks beyond the three used on Wikidata is also interesting (but is this possible, technically, without major surgery to the code of wikibase?), eg to hive off secondary tags to a lower rank, so many applications could ignore them. But going down the road, I suspect that tying the secondary tag to the regular tag is probably information that will turn out to be useful. If an additional rank were going to be introduced for anything on CommonsData, I would put one for "inferred by machine; not confirmed" at the head of the queue -- I suspect it is a status we may be going to be seeing a lot -- to rank below a regular statement, but still be eligible to be included as a wdt: statement in the RDF, if there was no regular statement outranking it.

As regards data import, I suspect we're kidding ourselves if we think this is ever going to be easy. I'm working on an image upload project with a major GLAM at the moment, with simultaneous creation of Wikidata items for the underlying objects, and the reconciliation of names for people and places to Wikidata is brutal -- easily the most rate-limiting aspect of the whole process. This is probably as near as one can get at the moment, before CommonsData goes live, to what an upload involving Structured Data will entail. As an example, the current batch of images I've been working on contains 200 creators or contributors, with names that are supposedly normalised to the Library of Congress preferred form, if the LoC has heard of them. An initial match to the LoC and then Wikidata found 90 matches, 10 of which turned out to be wrong. By trying matching via VIAF, and then going through remaining candidates one by one, I've now raised the 'matched' count to 110 of the original 200, but it's taken a day and a half to do. And this batch is just 2% of the overall collection. Perhaps the universe of potential "depicts" tags is a more limited vocabulary, but the matching of a tag vocabulary to Wikidata, and then even more so the verification of that matching, is not a small job. I suspect that against all that, using machine methods to identify when one tag is probably just a less specific intimation of another tag, and should therefore be made subordinate to it, will likely add no more than a drop in the sea.

A further point is that Commons will still be expecting all uploads to be fully categorised, and for those categorisations to obey COM:OVERCAT, ie only categorise with the most specific indications. Structured Data should help a lot with that -- one of the reasons I'm so much trying to go the Wikidata route with my current project is to then be able to read off the appropriate Commons categories -- but to avoid OVERCAT the uploaders will thus need to work out in any case which tags are redundant to which other ones, so the effort of determining this to store them in qualifiers is not really an additional overhead. Jheald (talk) 18:52, 25 September 2018 (UTC)

For "make primary", we're exploring whether it can serve more than one purpose. Yes, its main use would be to identify the main subject of the media. But perhaps this feature (or something similar) could also say, either implicitly or explicitly, that the tag in question should be the one to host relevant qualifiers. Again, this is all still work in progress and we have a lot of different use cases to account for, so we certainly won't have anything solid on this until next month. RIsler (WMF) (talk) 18:03, 26 September 2018 (UTC)

Hope we misunderstood the comment made by Keegan (WMF), otherwise it is likely better to develop FastCCI tool, and to create a "tag" namespace in Commons that will work in parallel with category tree but that will not be subject to our over-categorisation rules. Example : if you categorize your file with Category:Dog then Tag:Canis, Tag:Canis lupus, ect, ect... are automatically added to the file by a BOT or a software, and when you click on Tag:Canis then you see all the images that have "Canis" as tag. This would allow to stop spending a signifiant part of the $3,015,000 USD of that project. Sorry for that last sarcasm. Christian Ferrer ^(talk) 12:03, 23 September 2018 (UTC)

@Christian Ferrer: 1. Refers to statement tagging, not category tagging. Categories remain an independent process 2. Correct, the file would have to be tagged with "dog".

I'll work on getting some more specific answers to other concerns and questions. Keegan (WMF) (talk) 19:05, 24 September 2018 (UTC)

ok thanks for the answer. Christian Ferrer ^(talk) 21:11, 24 September 2018 (UTC)

It seems to me that it is a disaster that the system will not automatically be able to make a search based on a hierarchy of tags. Would it be possible to offer both types of search, i.e. a simple tag search which would be fast and a hierarchical search which would be understood to be slow (perhaps limited in the amount of hierarchy which could be searched)? Strobilomyces (talk) 11:52, 25 September 2018 (UTC)

@Strobilomyces: I can't speak for the team, but as I understand it the sheer number of different ways different properties are used in different circumstances, plus the density of very odd glitches in the WD ontology, plus the difficulty of prioritising results to meet general users' expectations of seeing the results that they would actually want to see, have put the team right off offering any deep hierarchical search. (See the assessment by Cparle on the ticket I linked above for just a taster of some of the problems lurking under the surface). Any attempt in this direction would be a major research project, simply not on the agenda for the team trying to ship version 0.1

BUT --- all of CommonsData and all of Wikidata should be accessible from WDQS, so it should be possible to write queries in SPARQL that are as deep and complicated and bespoke and intricate as one could wish. And probably, soon enough, one will find that users who have a particular knowlege and interest in particular areas, understand the twisty details of the Wikidata hierarchy in those particular subject areas, and are prepared to put in the time to extend some of the data that is incomplete and fix some of the statements are wrong -- those users are quite likely to start producing ready-written query designs for particular subjects and disciplines, that somebody might well graft a user-friendly front-end onto. But nobody should underestimate the amount of data that is going to need to be improved on Wikidata, if those queries are going to produce good and solid results -- just look at all the data that is currently missing from the infoboxes on categories, just for starters, never mind all the data that is still needed to make sure the hierarchies behind those items are solid and robust. Jheald (talk) 17:20, 25 September 2018 (UTC)

Thanks for the answer.Strobilomyces (talk) 11:44, 26 September 2018 (UTC)

I have some doubts about this. From my experience with the Wikdiata ontology I have to admit that it might not be well suited for Commons because it is deeper than what Commons needs, and perhaps not as user-oriented as one would expect. The thing is, there is nothing stopping Commons users to create their own ontology or hierarchy of depicts items. So why not have an own collection of depict items on Commons itself and structure them as wished? Then they can be connected to Wikidata items where appropriate, and use whatever ontology the user wants.--Micru (talk) 07:51, 29 September 2018 (UTC)

@Micru: CommonsData is not currently projected to support generic items, only media-items for particular media files. Generic items are expected to live on Wikidata (per current plans, at least). Jheald (talk) 11:29, 29 September 2018 (UTC)

The question which was not studied is what should be done in wikidata ontology to allow a correct search using the wikidata ontology. Currently nobody try to improve the wikidata ontology because there was no reason to have a strict set of rules. But we can improve the ontology by fixing a set of simple rules like an item should not be an instance and a subclass at the same time or no reference cycle. Snipre (talk) 07:19, 2 October 2018 (UTC)

@Snipre: The comment by Smalyshev on wikidata-l is also worth reading [1] : The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing... One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance... I think this is very much something that the community can do. Jheald (talk) 08:11, 2 October 2018 (UTC)

Info I wrote some relative comments in [2] Christian Ferrer ^(talk) 11:14, 2 October 2018 (UTC)

@Keegan (WMF): If I understand correctly: The current wikidata ontology is unsuitable for searching (e.g. related discussion) which is a huge problem. I do not think it is a good idea to cover up this mess with hundreds of different tags. Instead the image classification and searching algorithm should be a motivation and help people to fix the ontology. --Debenben (talk) 15:59, 3 October 2018 (UTC)

@Keegan (WMF): I fully agree with above, if "German Shepherd" is currently no linked (in the results of a potential search) with the taxon chain of Canis lupus familiaris, it is because the ontology is not well done, Structured data for Commons may be a good idea only in the extend that the "data" is indeed well structured. In Wikidata German Shepherd should be a "breed" (with "breed" as a property) of Canis lupus familiaris, however it is not. It is currently a breed of dog, which literally is true but ontologically totally wrong, "dog" is not a species but a taxon common name. I wonder how many items are affected by this kind confusion. As well woman (Q467) is a "female adult human" only in the description, but not in the statements, where you can indeed find "female" and "adult" but not "human", therefore women will never be highlighed if you search "female mammals". But that's not why I pinged you, has it been envisaged to have the possibility to add qualifiers to the depicts "tags", as it is shown for the Search prototype? That will be good. Sorry if it is already written somewhere and if I missed that. Christian Ferrer ^(talk) 05:24, 7 October 2018 (UTC)

Necessary changes to how viewing and using old file page revisions functions

THE FOLLOWING ONLY APPLIES TO OLD REVISIONS ON THE FILE NAMESPACE ON COMMONS. IT DOES NOT AFFECT THE FILE SPACE ON ANY OTHER WIKI, OR ANY OTHER NAMESPACE ON COMMONS.

Structuring data on Commons with Wikibase changes how content on a file page is stored and served back, known as ‘’multi-content revisions’’(MCR). Instead of a file page revision being a single big chunk of information, data is broken apart into pieces known as “slots.” When you view a file page, its history, or any individual revision, what you are seeing is being assembled from multiple slots.

This makes serving old revisions of a file page complicated, as one slot may have a revision that has been edited while another slot has not been changed. The old version of a file page cannot be returned in the same way that a plain wikitext-based wiki page works, which simply finds the specific past version of the wikitext on the file page – because there is only one – and returns that.

In order to make MCR work on old revisions of file pages, the development team is looking at making these old versions of pages match how Wikidata functions. The following things change when looking at an old revision of a file page:

The Edit tab at the top right of the page is replaced with Restore

The function of the Edit tab, accessing the old version of the entire wikitext of a page in order to be restored is removed. Instead, a page is shown with the differences between the current and old revision (the one being restored), with an edit summary field.

Let’s say that you want to revert a file page to a specific version from the past. Currently, you’d access the History, click on the revision that you want. From there you would click on the Edit tab, view the old text in the editable text box and fill in an edit summary, and save the page.

The new function has you access the history, click on the revision that you want. From there you would click on the Restore tab (which has replaced the Edit tab). You’d then see a diff of the revision from the current page, and an edit summary for to fill in with the save button. The editable text field is removed. This is replicating how Wikidata handles old revisions.

Here is what editing an old file revision on Commons currently looks like. This will go away.
Here is what editing an old item revision on Wikidata currently looks like. This is what will replace the old way.

If you’d like to read through the technical discussion that resulted in this decision, here is the Phabricator ticket where you’d start. There are more links within, including links to gerrit patches.

There are advantages to serving old revisions in this new manner, the main one being simplifying the process of restoring an old revision should that be your goal in editing the page. There are some drawbacks to this decision however, primarily that the entire wikitext of old page revisions will not be available for copying, if someone is looking to duplicate old text on another page. Individual line changes can be seen and copied from the diff view. As mentioned at the top, this change will only affect the File namespace on Commons. Access to old revisions in the Commons namespace, Template namespace, Main namespace, etc., will remain as it is today. This use of old revisions in the File namespace does not seem to have a large impact on Commons, and the team hopes that any disruption in workflows from change in how old text is accessed is minimal. The team may try to look into other ways of serving the entirety of old wikitext page revisions, but it will not be possible in the near future.

Are there any questions about this change? Keegan (WMF) (talk) 19:37, 27 September 2018 (UTC)

So it's not going to be possible to undo just a change to the wikitext, without reverting back the structured data -- nor to just revert structured data, without reverting the wikitext?

This might be a problem, if we consider that it might often be two largely distinct communities editing the data (probably heavily mechanised) and editing the wikitext (probably manually), often likely quite largely independently.

If somebody reverts back some edits to the data after a mistake, while in the intervening time an edit has been made to the wikitext, it sounds as if that wikitext edit will be reverted back too, and may be quite hard to reinstate, if it is no longer possible to access the wikitext of a whole page in the form to which it had been updated. This might upset non data-editors quite a lot. Jheald (talk) 20:35, 27 September 2018 (UTC)

Undoing and reverting will work just fine. Here's what you won't be able to do directly anymore on a File page: open the old revision, edit that revision directly as wikitext in the editing box with the big warning that this is an old revision, and save it as the new revision. Keegan (WMF) (talk) 22:17, 27 September 2018 (UTC)

I'd like to point out that this use-case for an old revision of a file page, accessing the old wikitext directly to either copy or manipulate it to save as the current revision, does not seem to be a common workflow for file namespace here on Commons. It is quite common in discussion spaces, and on other wikis. Please let us know if there is a prevelent use-case for this workflow that we need to figure out a solution to. Keegan (WMF) (talk) 22:21, 27 September 2018 (UTC)

To add on to what Keegan said above - A.) The MCR team is still working on features and, in the future, should have a way to access the Wikitext of old revisions. It's just probably not going to be ready for our v1 launch. B.) As we get closer to launch and start putting things on Beta for testing, we'll explore a few possible temporary workarounds to address some edge cases as Jheald mentioned. RIsler (WMF) (talk) 01:51, 28 September 2018 (UTC)

Will this affect API access to old revisions? Currently, Geograph Update Bot makes some use of the ability to read past revisions (to check if location templates are the same as in the first revision), and I'd like it to make more (to detect location templates added by other bots, and to detect when it's thinking about reverting another user). It would be unfortunate if these became impossible. --bjh21 (talk) 21:20, 27 September 2018 (UTC)

To me, this seems like one more argument for the serialization/deserialization approach I've suggested several times. - Jmabel ! talk 23:18, 27 September 2018 (UTC)

This is just for older versions, right? The wikitext of the current version will still be accessible? (At least, that part of it that isn't the structured data.) BTW, it does seem to be possible to get hold of old versions of Wikidata items using getOldVersion() in pywikibot, but not to format it into an item_dict using get() (you have to manipulate the json - e.g. see [3]), I guess the same might be possible here so that bot-actions (for spotting/reverting vandalism and bot errors) would still work if needed? Thanks. Mike Peel (talk) 00:17, 28 September 2018 (UTC)

Wikitext of the current version will still be available. RIsler (WMF) (talk) 01:52, 28 September 2018 (UTC)

BTW, I'd send around the alert about this discussion as you did for #Property creation on Wikidata - this is a more important discussion than that one was... Thanks. Mike Peel (talk) 00:18, 28 September 2018 (UTC)

I'm doing so first thing in the morning. It was a very busy day, and experience has taught me to not send a massmessage at the end of a busy day :) Keegan (WMF) (talk) 02:10, 28 September 2018 (UTC)

I did this a little while ago. Keegan (WMF) (talk) 16:07, 28 September 2018 (UTC)

Question How will work Commons:Rollback? Christian Ferrer ^(talk) 16:34, 28 September 2018 (UTC)

Question how will be displayed file histories and how will be displayed Difference between... Christian Ferrer ^(talk) 17:12, 28 September 2018 (UTC)

None of these things should be affected or change. What is changing is how old revisions work in relation to viewing an old version of a page. Keegan (WMF) (talk) 17:34, 28 September 2018 (UTC)

Ok thanks you for the answer. Christian Ferrer ^(talk) 17:55, 28 September 2018 (UTC)

Losing access to old revision's plain wikitext is plain red flag IMO. It may not be used that often but that's really helpful when you need it. Wikidata - you never see the plain wikitext (and you never need it), and here on Commons, we work with plain wikitext. — regards, Revi 16:43, 28 September 2018 (UTC)

@-revi: How often do you need the bulk plain wikitext from an old revision of a file page? I ask because as @RIsler (WMF): mentions above, the team should have a feature to access the old revision wikitext again in the future and this removal is temporary (unfortunately we do not know how long temporary is). If you find that you do access these old revisions on file pages regularly as part of a workflow, we'd love to hear about it and pursue a workaround. Keegan (WMF) (talk) 17:03, 28 September 2018 (UTC)

it is a commons workflow for me, to edit text to change information template to artwork template (for example) or copy paste LOC metadata into an old LOC flickr upload, i even use visualfilechange to make mass text changes to old files. but i would be happy to edit wikidata instead. a pencil on the template, to go to wikidata would be expected. and on ramp to QuickStatements. Slowking4 § Sander.v.Ginkel's revenge 02:51, 29 September 2018 (UTC)

Agreeing with @-revi: This is a disgrace. How often do we use it, @Keegan (WMF): ? Often enough for you not break it, what about that? -- Tuválkin ✉ ✇ 02:57, 29 September 2018 (UTC)
- I'm trying to make sure I understand: will it still be possible to edit the latest version in a straight-text manner? That is, when you are talking about "old" revisions not being editable this way, is that just ones that have already been changed, or does that include the latest? Because if it's the latter then, yes, this is going to break a lot of workflows. - Jmabel ! talk 03:34, 29 September 2018 (UTC)
  - I'm still waiting for an answer to this, as conversation has headed off in different directions. - Jmabel ! talk 17:13, 29 September 2018 (UTC)
    - @Jmabel: The latest revision of the page will be editable just as it is today. We are only talking about when you view a historical version of the page. Keegan (WMF) (talk) 18:34, 1 October 2018 (UTC)

I have three concerns:

In the two examples above the one from Commons allows you to preview the page, but the wikidata version does not. Does that mean we will also lose the ability to see what the page will look like while in the process of performing a restore? Fortunately, I assume as a workaround we could instead take care to always start by clicking on an old revision to see a rendered version of the the old version before clicking on the "restore" command.
A common use I have for examining the old wikitext for a page is to figure out what wiki code was used in the past to produce a complex layout of description and licensing templates that have since been changed. Possible workarounds would be to either (1) restore an old revision, copy the old code, restore again to revert that restore, or (2) copy the current wikitext and then manually apply the diffs backwards through each revision to reconstruct the old code. Neither is particularly appealing. I would definitely sorely miss the ability the directly copy the wikitext of old revisions in the File namespace. I could live without it temporarily, but this is not something I do infrequently.
Another case where I use the wikitext of an old revision in the File namespace is when making corrections to fix an edit that has broken the template rendering of a page. Commons File pages make heavy use of templates for their content and are often edited by editors from other wikis who make use of Commons but are not themselves primary contributors to Commons, so they are less familiar with the complex set of templates that Commons has built to produce the majority of the content in the File namespace. It is not uncommon for inexperienced editors to make an edit to a file page that adds useful information but also breaks the page rendering in a significant manner. For these types of corrections I will often click to start editing the revision of the page immediately before the less experienced editor started editing the page, copy out the wikicode for the portions that they inadvertly disrupted, and then use this copied code to make a correction. Alternatively, I might start by editing the revision of the page before the inexperienced editor started, actually make the change the other editor was trying to make and would have made if they were more experienced with Commons, and then submit. Making these types of corrections will be much harder without access to the wikitext of old revisions in the File namespace. As a workaround we will instead have to restore an old revision, copy the code, restore again to revert that restore, start a new revision, paste the copied code, make the necessary corrections, and then submit.

—RP88 (talk) 04:06, 29 September 2018 (UTC)

For the second point RP88, if the answer made by Keegan (WMF) to my question above is right, then you should be able to copy a wikitext with Difference between.... Christian Ferrer ^(talk) 06:04, 29 September 2018 (UTC)

@Christian Ferrer: In your example link try to use your browser to copy to the clipboard just the contents of the older {{Information}} template and its parameters on the left side of the comparison. In the browsers that I've tried (Chrome, Firefox, Safari) the copied text will be intermingled between the old and new {{Information}} template/parameters. Using a diff as a way to retrieve the text of an old revision can be done, and is usually not too onerous for less complex edits, but quickly becomes impractical. —RP88 (talk) 06:20, 29 September 2018 (UTC)

Comment I don’t believe I ever needed to edit the wikitext of old revisions of File pages, and can’t think of any use case I would have needed that ability for. :) Jean-Fred (talk) 08:20, 29 September 2018 (UTC)
Comment MediaWiki already supports "view source" of a page, for example as offered when a page is locked. If it is necessary to withdraw the "edit" option link from the page history (and I don't fully understand why that is so necessary), would it not be possible to offer "view/view source" instead ? Jheald (talk) 11:35, 29 September 2018 (UTC)

As I understand it, the tricky part with viewing the old source of a page that has MCR is that not all revisions of all things are living in the same place, so assembling that snapshot in the raw is what becomes infeasible and why the view is changed from plain wikitext to the diff view. I think it might be possible in the future to put back together, but for now we need a workaround. Keegan (WMF) (talk) 18:41, 1 October 2018 (UTC)

@Keegan (WMF): That seems rather odd. One would expect at least all the wikitext to be living in the one place. Jheald (talk) 19:15, 1 October 2018 (UTC)

Let me try to clarify what Keegan meant above. It's hard to explain without getting in the weeds about what MCR does, so let me provide a short answer - we might indeed provide a view source button/tab, but it may be easier to simply provide a source view via a modified querystring on the EditPage. The ultimate point is that there *will* be a way to view the Wikitext of a past revision, we just haven't settled on the best way to do that yet. RIsler (WMF) (talk) 21:04, 1 October 2018 (UTC)

Multiple questions regarding the change @RIsler (WMF) and Keegan (WMF):

Who is going to merge old filedescripon pages in the new system`?
Why has no community consenus seeked, on Commons:Village pump/Proposals?
Who is going to fix all the bots which will break once the change is merged?
If i remember corrently, somehwhere staff promised that filedescription pages and categorys will be keept. Why has this changed?
Who is going to fix all the gadgets which will break?

Best. --Steinsplitter (talk) 06:32, 12 October 2018 (UTC)

I left a note on COM:AN, so we can get a bit more input here. Best --Steinsplitter (talk) 06:40, 12 October 2018 (UTC)
- Steinsplitter, why did you leave a note on AN -- this does not require administrator intervention. The Village Pump seems more appropriate (and it doesn't seem like a proposal, more like a FYI). -- Colin (talk) 06:55, 12 October 2018 (UTC)
  - Sounds reasonable, moved it to VP. --Steinsplitter (talk) 06:56, 12 October 2018 (UTC)
    - I left a VP note when this was posted. Keegan (WMF) (talk) 17:13, 12 October 2018 (UTC)
CommentI don't think this will be any problem for me. I often want to see the wikitext of current revisions, in order to copy/paste to another page (which is what I suspect some of the hasty opposers above are doing). But I've never needed to do that for old revisions. Indeed the only time I've ever needed the old revision of a File page on Commons to revert to it. As an aside, I wish one could revert to an old version of a file without that appearing in one's upload log -- if the devs know of a ticket for that one, I'd support it. -- Colin (talk) 06:55, 12 October 2018 (UTC)
@Steinsplitter: File description pages and categories are being kept. Page history merges will not change. As for why consensus wasn't sought, it's because this isn't an optional feature, it's a required function. I'm not aware of gadgets, bots or tools that this particular change might break (and I had a look). Are there any particular ones that you had in mind? Keegan (WMF) (talk) 17:12, 12 October 2018 (UTC)

Required by whom, or by what? -- Tuválkin ✉ ✇ 18:22, 12 October 2018 (UTC)

Multi-content revisions, the software that assembles pages from Commons and Wikidata. Keegan (WMF) (talk) 21:42, 12 October 2018 (UTC)

I run User:Geograph Update Bot, which inspects old revisions of pages (and of files). I asked above if API access would be affected, but I haven't yet had a reply. --bjh21 (talk) 19:12, 12 October 2018 (UTC)

I'll get you an answer. Keegan (WMF) (talk) 21:42, 12 October 2018 (UTC)

This particular change we're discussing is more about reverting old revisions via the UI. We have no current plans to change API access to *read* Wikitext for old revisions (new MCR stuff will be backwards compatible). If an issue comes up that requires such a change to be made, we'll be sure to inform everyone before it happens, but as of now the plan is to keep the basic API functionality working as is. More preliminary info here: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Action_API RIsler (WMF) (talk) 23:49, 12 October 2018 (UTC)

@RIsler (WMF): Thank you. That satisfies my concerns. --bjh21 (talk) 08:59, 13 October 2018 (UTC)

This discussion is 3,300 words long, having run for just over 2 weeks. This discussion was mentioned on the VP, but there was no other effort to notify users, even those of us that put our names forward to be part of formal consultation. It's only happenstance that I remembered the VP mention, which happened while I was away travelling.

The change is significant in the fundamental way that Wikimedia Commons works and should be run as a proposal or RFC, run for at least 30 days, and benefit by having a FAQ based on the questions raised so far.

It's worth pointing out that as the most active current uploader of images on Commons and this change is worrying due to the future potential impact on the way that upload projects will work, templates can be used and running housekeeping tasks on uploads, which includes automatically reviewing past versions of image page text (many of my bot tasks do this as part of checking past bot actions and ensuring bots do not edit-war with "human" changes). Despite vague assurances that this probably will not be any more volunteer effort, I do not believe that will be the case long term. This change is part of making templates harder to use and image page text becoming harder for newbies to format "correctly", with "correct" being defined by whatever pops out at the end of the WMF's structured data project. The authority for the changes comes from the WMF funded project, not because Wikimedia Commons volunteers have established a consensus for changes. Instead the structured data project has fudged consensus by having consultations, like this one, that procedurally mean very little and where input from volunteers can be cherry picked by the unelected, to demonstrate whichever case benefits the project at that time.

Thanks --Fæ (talk) 10:38, 16 October 2018 (UTC)

@Fæ: you've said that there was no effort to notify other users, even those that put names forward to be part of the formal consultation. I notified those people, including yourself.

As to the point about requiring consensus...

As a Wikimedian, I understand your point and the necessity for consensus when proposing to change community process and workflows. However, the fundamental problem with your proposal is that you're relating consensus to necessary software changes. The implication of asking for consensus is that if it's not found, the thing isn't done. This has to be done no matter what, so that completely fails in the process you propose. Instead, what we can do is inform people of incoming changes, look for the workflows that this might break, and help implement new systems or workarounds.

If there are concrete issues that you can identify now with how your bots operate, please let us know and we will work with you. If you are unsure of what might break because you won't know until this happens, please let us know at the time and we will work with you then. There are some aspects to this project that will be necessary to do, and the development team would very much like to make this as painless as possible. Keegan (WMF) (talk) 19:36, 29 October 2018 (UTC)

Mark for translation

Could you mark for translation the part please? Sorry can’t give the source link:

Read the log of the last IRC office hour, which took place _date tag_.

--Omotecho (talk) 04:12, 29 October 2018 (UTC)

Thanks for pointing it out, I'm not a translation admin on this wiki so I'm working on getting someone to help take care it. Should be marked up soon. Keegan (WMF) (talk) 21:53, 29 October 2018 (UTC)

Could Wikidata be used to create non-English category names for existing categories?

I wrote the below sometime ago, but I don't really know where to propose it, Commons:Village pump/Proposals is exclusively in English and non-English speakers simply won't be made aware of it, as this doesn't really concern any policy or "community" thing but rather the underlying software of Wikimedia Commons I thought that it might find a better place here. Note that because English is the de facto language of Wikimedia Commons (and all "multi-project" websites like Wikidata) I don't think that anyone who can't understand English will be able to give their input about this, but really these people are "voiceless" here because if you can't understand English you're still very likely to run into English everywhere and let's say someone who only speaks Wolof won't be able to contribute much here, unless we would allow Wikidata to start adding non-English titles and translations for Wikimedia Commons. Maybe this could also apply to tags and properties.

The original proposal is below:

Sometimes I wonder into Wikimedia Commons from a non-English language Wikimedia project such as the Mandarin Chinese Wikipedia or the Vietnamese Wikipedia, however rather than finding the titles appropriate for the language I Wikimedia Commons displays in, it still shows the English language titles such as “Category:Round coins of the State of Qi” even though I’ve added the Taiwanese title “齊國圜錢” which is also used at the Mandarin Chinese Wikipedia. It would make sense for people who can’t speak and understand English to still be able to navigate their way through Wikimedia Commons, right? This could easily be implemented by just changing the display titles of the categories, non-English titles would redirect to their English equivalents but would display themselves in whichever language the viewer has set their preferences to.

Additionally translations could be imported from Wikidata, Wikidata often features non-English translations of its items so a bot could simply mass-create redirects that will serve as alternative display titles based on these. In case these alternative titles change on Wikidata they should change on Wikimedia Commons, but I think that the software should be able to prefer Wikipedia article titles over translations but should be able to be manually overwritten locally will like a special template like {{Preferred translation}}. An example of using a more Wikipedia-centric translation method would be using Category:Archaeological Museum of Thessaloniki which exists as the English Wikipedia article “w:en:Archaeological Museum of Thessaloniki” and as the Greek Wikipedia article “w:el:Αρχαιολογικό Μουσείο Θεσσαλονίκης”, then the alternative title for the Wikimedia Commons category for users coming in from that language could be “Category:Αρχαιολογικό Μουσείο Θεσσαλονίκης” or “Κατηγορία:Αρχαιολογικό Μουσείο Θεσσαλονίκης” (namespace changes could be handled outside the redirects and just automatically display the translations in the preferred language). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:24, 31 October 2018 (UTC)

This will be possible with structured data, at least theoretically and probably not quite how you envision. Whether or not the community would like to do so will be up to, well, y'all :)

There are people working to create 1:1 equivalent Wikidata items for Commons categories. Items can hold translations, and with structured data people would be able to add these Wikidata Category statements to files which are then accessible in available languages. Again, it'd be up to the community to figure out this work flow and if you want to do that. It would be slightly redundant in some ways as files will have other statements that expose the same or similar information, but I image there's a use-case for maintaining the category tree in structure. Keegan (WMF) (talk) 17:05, 31 October 2018 (UTC)

@Keegan (WMF): , then if category trees will largely be superseded by tags and statements then these should also be made available in a variety of languages. However categories have something that many other statements don't, they're often directly linked to other Wikimedia projects including Wikipedia's in various languages, I'm not sure how other Wikimedia projects will be properly linked through with other statements but in its current form users often come to Wikimedia Commons categories through "equivalent Wikipedia articles". Will this somehow also be superseded in a manner that subject relevant images could be easily linked in other Wikimedia projects? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:12, 31 October 2018 (UTC)

I think this is achievable with no software changes and only a small settings change. If $wgRestrictDisplayTitle were disabled on Commons then {{DISPLAYTITLE}} could be used to set the displayed title of a category to something appropriate for the language (using {{LangSwitch}}), with a bot handling creation of redirects and correcting uses of redirected categories. It might be possible to import translations from Wikidata, but there are several reasons why this wouldn't work in general (non-unique labels, categories not notable enough for their own Wikidata item, etc) so many would have to be maintained here. So I think this is a policy and/or community matter and COM:VP or COM:VPP would be a perfectly sensible place for it. --bjh21 (talk) 18:44, 31 October 2018 (UTC)

@Bjh21: , alright then I'll post this to the village pump to propose it after I've re-written it a bit. Would there be a way to mass notify non-English speakers in their languages to ask for feedback? Also most of it would have to be done locally due to the notability "issues" with many categories so I wasn't ever expecting Wikidata to cover the translations 100% (one-hundred percent). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:00, 31 October 2018 (UTC)

Let me know if it would be useful/possible to add a call to something like DISPLAYTITLE to {{Wikidata Infobox}} - hopefully the titles in the infobox already show the labels that we'd want to use here. Thanks. Mike Peel (talk) 21:12, 31 October 2018 (UTC)

@Donald Trung and Keegan (WMF): There's a huge use case for being able to query the category tree using SPARQL from within the CommonsData Query System (phab:T194401#4661082) -- especially in the early stages when there will be a huge amount of information to sync from categories to structured data and vice versa. But that is best done with something that monitors changes to the category table and reflects them directly to the SPARQL triplestore, not by creating actual redundant statements on CommonsData and trying to keep them up to date.

Nor is anyone going to rewrite the software around the category table in MediaWiki. It will continue to be built around a category page name which is a string, picking up Category:pqrst statements in the wikitext of the page.

Yes, we have an increasing number of categories linked to items on Wikidata, which can have labels translated into multiple languages. But using those to present/interpret aliases for category names -- both at the bottom of file pages, and at the top of category pages -- would be quite tricky. For one thing, Wikidata labels are not disambiguated, whereas category names need to be unique. It's perhaps not completely impossible, but it would be a major major amount of work, and I am dubious that resources would be diverted to make it happen any time soon. Jheald (talk) 00:37, 1 November 2018 (UTC)

To amplify something I wrote in the previous paragraph: it's not enough for the software just to rewrite the display title on the category page. It also needs to similarly translate category names at the bottom of file pages, otherwise these will not match, and things will be very very confusing. And also, the modified software needs to be able to recognise the category-name translations, if these are used e.g. to add a category when a page is edited. This is starting to become quite a patch, and one would also need to make sure key tools like HotCat and Cat-a-lot continued to work, too. Jheald (talk) 00:46, 1 November 2018 (UTC)

Might be better as something supplemental than as a replacement. - Jmabel ! talk 15:32, 2 November 2018 (UTC)

Copyright and licensing statements

New designs are up for structured copyright and licensing statements, based on feedback from the first round of designs. Please look them over, they are very important to the project and Commons. Thanks! Keegan (WMF) (talk) 16:48, 2 November 2018 (UTC)

Lua version of the {{Information}} template

In preparation for the Structured data on Commons it might be a good idea to revamp our most used infobox template: {{Information}} and rewrite it in Lua. I can look into adopting and simplifying some of the code used to {{Artwork}} template to develop Module:Information which should be a very simple and lightweight replacement of the current wikicode. Of course once the sandbox version is ready we would notify the community and go through extensive testing process before any deployment. The Lua code at this phase would simply mimic output and behavior of the wikitext but in the future might be used for merging data stored in the information template with data stored in structured data. --Jarekt (talk) 17:21, 22 October 2018 (UTC)

I'd suggest holding off on doing something like this. I do not think we have the overhead to support this at this time, it might be more useful to wait and see how the function of SDC takes shape. Keegan (WMF) (talk) 19:38, 29 October 2018 (UTC)

@Keegan (WMF): I think Jarekt's idea might good, so could you explain what you mean or expect, I don't understand. Thanks, Yann (talk) 20:55, 29 October 2018 (UTC)

It will cause server strain in two ways. The first is a one-off individual occurrence, severe load times when editing the template. Maybe not so bad, something people could live with. However, the second way is that it makes re-rendering the page extremely expensive, and will forever slow down loading the File page for everyone. That's not really something we can live with. I'm not an engineer, this is the simplified form of what I've been told, so if we need more specifics I can try to dig them out. If we can figure out in a way in the future to do this without the strain, that'd be great. On the other hand, we'll likely have structured data more feature complete by then, and it might be moot. Keegan (WMF) (talk) 21:13, 29 October 2018 (UTC)

The Module:Information is ready for testing, as announced in Template_talk:Information#Rewrite_in_Lua. At this stage I am only testing it and will not deploy without broader support. @Keegan (WMF): , would this change be significantly differ than any other change to {{Information}} template or one of the 15 templates and modules it calls? Edits to one of those templates is not unusual. Also there are other modules and templates that are used a lot, like Module:Fallback with 44M transclusions, or Module:I18n/date with 52M, and although we are trying to minimize number of edits, they are beeing changed all the time. In my experience, the "one-off individual occurrence, severe load times when editing the template" means that the tab becomes unresponsive, than it might take up to 2-3 months until all the files using templates are updated. I am all for "figure[ing] out a way in the future to do this without the strain", since I expect that once Module:Information is life there will be a lot of ideas on how to improve it. --Jarekt (talk) 13:14, 2 November 2018 (UTC)

@Jdforrester (WMF): thoughts for Jarekt here? Keegan (WMF) (talk) 16:40, 2 November 2018 (UTC)

@Keegan (WMF) and Jarekt: Correct, ideally you wouldn't ever touch those modules either, and certainly not change it "all the time". Jdforrester (WMF) (talk) 16:51, 2 November 2018 (UTC)

@Keegan (WMF) and Jdforrester (WMF): OK, "all the time" is an exaggeration, but we do have 200+ templates and modules with 1M and more transclusions and 37 of them with over 10M transclusions (see Special:MostTranscludedPages). All those pages are protected (I hope), and we try to limit the edits to them, but there is always something to fix or improve. In the past the general attitude was en:Wikipedia:Don't worry about performance, I do worry about it but not to the point of postponing improvements. As the number of files is going up those edits affect more and more pages, this might become more of the issue. Keegan was hoping we "can figure out in a way in the future to do this without the strain", any chance of that? --Jarekt (talk) 02:14, 3 November 2018 (UTC)

@Jarekt: Absolutely, I understand the evolving needs for such tools (new languages, new designs, etc.). However, implementing them in Lua, though indeed it is far better than wikitext, is very slow compared to proper code to do it in PHP/JS. A big part of the benefit from the Structured Data on Commons work is to provide tools for all the media files that could be implemented in Lua, but in a way that isn't a performance disaster. Taking weeks to roll out each typo fix to all files is terrible experience for Commons users (readers/re-users); via MediaWiki code itself it runs faster and more manageably for the servers, and so accidental breakage doesn't sit on millions of files for "2-3 months", as you put it. Each new feature added via a template or module is at the rough cost of a dozen such done via the main system. (In other words, just because you can, doesn't mean you should. 😉) Jdforrester (WMF) (talk) 18:13, 5 November 2018 (UTC)

@Keegan (WMF) and Jdforrester (WMF): I see, so the way to make edits to templates like {{Information}} would be to move them from wikitext or lua implementations into MediaWiki code itself. I am fine with that. My assumption was that Lua code will be the place where we merge inputs from Structured Data on Commons (SDoC) and the wikitext for files that have some data in one format and some in the other format. For example, some image we managed to parse the date and we store it as SDoC, but the author field is not comprehensible to algorithms and we do not have anything in the SDoC (until someone inputs it manually), so we display that information the way we always did: from the template wikitext. We do something like this right now with {{Artwork}}, {{Creator}} and other templates. So if the Lua is the place to merge those two streams of data, than the first step would be to have Lua code for a single stream (wikitext), that do not breaks existing file description pages. Latter when we have SDoC, we can extend the Lua code to render information from it and perform merging tasks. So would be plan be to do the merging in MediaWiki code, or something else entirely? --Jarekt (talk) 21:15, 5 November 2018 (UTC)

I strongly suggest putting this project on hold until we have some features out for SDC and we all (the dev team and the community) can see what can be accomplished without having to take this to MW core, or do any sort of complicated work that might have negative impacts on the site before we see what's possible first with SDC. Keegan (WMF) (talk) 21:26, 8 November 2018 (UTC)

Ok, lets put it on hold. However in the past, several incremental changes prooved less desruptive than acumulated sweeping changes, as they have higher chances of of not beeing reverted. --Jarekt (talk) 20:21, 9 November 2018 (UTC)

One question: What's that "Office" option?

Hi, I'm watching the presentation about structured data on Commons, and hit the test page for search. There "Image", "Videos", etc... and then, "Office". What's that? Also, how things are going to be translated for those queries? Cheers! Tetizeraz. Send me a ✉️ ! 19:17, 3 November 2018 (UTC)

"Office" is a holding name for document type files. It's not the set name that will be used. As for translations, they will be done through translatewiki.net, where all MediaWiki system messages go for internationalization. Keegan (WMF) (talk) 21:27, 8 November 2018 (UTC)

Thanks for the clarification! Tetizeraz. Send me a ✉️ ! 16:40, 11 November 2018 (UTC)

Multilingual captions beta testing

The Structured Data on Commons team has begun beta testing of the first feature, multilingual file captions, and all community members are invited to test it out. Captions is based on designs discussed with the community[4][5] and the team is looking forward to hearing about testing. If all goes well during testing, captions will be turned on for Commons around the second week of January, 2019.

Multilingual captions are plain text fields that provide brief, easily translatable details of a file in a way that is easy to create, edit, and curate. Captions are added during the upload process using the UploadWizard, or they can be added directly on any file page on Commons. Adding captions in multiple languages is a simple process that requires only a few steps.

The details:

There is a help page available on how to use multilingual file captions.
Testing will take place on Beta Commons. If you don’t yet have an account set up there, you’ll need one.
Beta Commons is a testbed, and not configured exactly like the real Commons site, so expect to see some discrepancies with user interface (UI) elements like search.
Structured Data introduces the potential for many important page changes to happen at once, which could flood the recent changes list. Because of this, Enhanced Recent Changes is enabled as it currently is at Commons, but with some UI changes.
Feedback and commentary on the file caption functionality are welcome and encouraged on the discussion page for this post.
Some testing has already taken place and the team are aware of some issues. A list of known issues can be seen below.
If you discover a bug/issue that is not covered in the known issues, please file a ticket on Phabricator and tag it with the “Multimedia” tag. Use this link to file a new task already tagged with "Multimedia" and "SDC General".

Known issues:

Search is not currently working on Beta Commons. Search is not needed for testing captions, but service should be restored soon.
Language name display inconsistencies in multilingual file captions
Language names for multilingual File captions can run into the caption value in the mobile view
Deleting a caption field that has text that exceeds the limit does not re-enable the "publish" button
"Edit" tab for past revisions should be "restore"
When using the back button, recently added captions sometimes don't display in Chrome
File Captions can't be edited on mobile browsers
Wrong font weight for some language labels on SDC captions box

Thanks!

-- Keegan (WMF) (talk), for the Structured Data on Commons Team 20:40, 17 December 2018 (UTC)