Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Timeline and GLAM upload[edit]

In Wikimedia Denmark within the Wiki Labs Kultur meetings, we discuss the possibility of mass upload of media files to Wikimedia Commons. I am (perhaps) of the opinion that it would be a good idea to wait until SDC is implemented. Am an wondering if there is anyone that can say something about the timeline of SDC and whether it is worth waiting until SDC is implemented? — Fnielsen (talk) 13:21, 24 August 2018 (UTC)

I don't think this matters much. On the other hand, it would be great if you could express what your original intention was, i.e. "mass upload of PD media files from defunct national TV broadcasting service", and your reason for waiting, i.e. "waiting for properties to express minute-bookmarks to enable media citation in Wikipedia" or something like that. Because in the end, nothing will change as far as file storage on Commons goes, and SDoC will just give us more tools to make existing media more findable in the wikiverse. Jane023 (talk) 07:23, 26 August 2018 (UTC)
@Fnielsen: You can do a lot to get your upload ready for structured data right now, by trying to match the people, places, objects, events etc related to your images to items on Wikidata (or creating new items on Wikidata if existing items don't exist but should). This is the work (and it may be a *lot* of work) that you would have to do anyway, for structured data to be of any value. Even without structured data, identifying the relevant Wikidata items can help identify the right names of categories that your images should be in; and if in the current description templates you include links to Wikidata items, that will make it easier when the time comes that that information can be moved to structured data. Jheald (talk) 12:30, 26 August 2018 (UTC)
@Fnielsen: I do not think you should wait. Structured data is going to be implemented in pieces and not all at once, with the first feature release coming in October (Multilingual Captions, more on that soon). And more specifically, Multilingual Captions will not be supporting batch uploads at launch, I believe. The next feature set, depicts, is coming the early part of next year. Things after that like licensing and attribution come later in 2019. Go ahead and run your campaign, I wish you much success! Keegan (WMF) (talk) 23:14, 28 August 2018 (UTC)
@Fnielsen: Apologies for my late reply! Jane, Jheald and Keegan have given you excellent input.
  • In terms of timeline: Structured Commons features will be gradually deployed. Expect the first really 'useful' features for GLAM projects ('depicts') in early 2019. Batch upload tools like Pattypan will probably not be ready for structured data immediately.
  • If your projects have planning constraints themselves (e.g. the uploads must be finished by, say, April 2019, for instance): by all means, go ahead and do the uploads in wikitext and templates. As Jheald says: in order to make your metadata easily translatable to structured data: make sure all relevant and notable people and organisations (creators, institutions), depicted artworks, events, places, buildings... are also available and well-described on Wikidata.
  • If you are flexible in terms of planning, and are excited to try the new technology, and not afraid of a bit of experimentation: yes, you can also wait! Spring 2019 is when I expect the first larger GLAM uploads to be feasible. It might be a bit of an adventure because so much of the workflows and tools will change.
Feel free to get in touch with me directly if you'd like more specific feedback. Warmly, SandraF (WMF) (talk) 11:27, 4 September 2018 (UTC)

Properties table[edit]

I've created a master table for holding Wikidata properties related to Commons: Commons:Structured_data/Properties_table.

Please leave your comments on the talk page, and help fill in missing property numbers if you are familiar with something that exists and could be listed. Keegan (WMF) (talk) 18:00, 29 August 2018 (UTC)

Mockups for structured copyright and licensing statements[edit]

Mockups of structured licensing and copyright statements on file pages are posted. Please have a look over the examples and leave your feedback on the talk page. Keegan (WMF) (talk) 15:24, 6 September 2018 (UTC)

Property creation on Wikidata[edit]

Hello everyone! Over the past few months, we brainstormed about Wikidata properties that will be needed to describe files on Wikimedia Commons, and those ideas have been summarized with a list of properties. Updates and feedback and further thoughts are still very welcome.

Some of these properties currently exist on Wikidata, but many do not and are in need of creation. Property creation on Wikidata is a community-driven process, the development team will be happy to follow along and to support where possible. As Depicts and other statements will be deployed in the first months of 2019, it is time to start process of creating new properties now.

Here are some first thoughts from the team.

Please let us know how you can help or what you think! Keegan (WMF) (talk) 16:37, 19 September 2018 (UTC)

I do think we need a separate section dedicated to Commons related properties, which might or might not be useful on Wikidata. Future copyright related properties should be discussed in that context. We actually have d:Wikidata:Property_proposal/Sister_projects#Wikimedia_Commons, may be that is the right place. --Jarekt (talk) 17:18, 19 September 2018 (UTC)
I suggest moving this discussion list of properties, to Wikidata by creating a page similar to d:Wikidata:Lexicographical data. We have several projects on Wikidata like d:Wikidata:WikiProject Informatics/Software/Properties created to get feedback from the Wikidata community members for deciding the use of existing properties or for the proposition of new properties. I however agree that this process may sometimes be very slow. Also the links pointed to by User talk:Jarekt are equally important. John Samuel (talk) 17:30, 20 September 2018 (UTC)
I'm happy to move/copy the table over to Wikidata. Would someone like to set up a page for it to live on, that has Wikidata-relevant project information? I'm not from the Wikidata community myself, so I'm unfamiliar with how that should go. Keegan (WMF) (talk) 16:51, 21 September 2018 (UTC)
+1 on @Jarekt:, I think it'd be a good idea to make a separate page for SD. Should we also create a separate page here? @Jsamwrites: We already have d:Wikidata:WikiProject Commons, but I think we can improve it. --Sannita - not just another sysop 19:43, 21 September 2018 (UTC)
@Sannita:, Thanks. I added myself as a participant. @Keegan (WMF): My personal opinion is that we can create a subpage on d:Wikidata:WikiProject Commons or copy the current discussion to the new page. John Samuel (talk) 10:50, 22 September 2018 (UTC)

Searching Commons - how to structure coverage[edit]

RIsler (WMF), the Structured Data product manager, has identified an issue that he'd like to bring to the community's attention, with regards to how search will function:

“After review with many engineering and product folks at WMF, WMDE, and within the Commons community, we've come to understand that the initial implementation of depicts "tags" for Commons media should be more focused on making sure all relevant concepts are identified and tagged, rather than limiting tagging to a few specific terms. Additionally, for now we won't rely much on the Wikidata ontology (the data structured) to find any additional depicts statements automatically.

Here is an example of what we mean. Let's try the hypothetical case of an image of a German Shepherd, and the user uploading it tagged it with only "German Shepherd" (Wikidata item d:Q38280):

  • We may be able to suggest an additional depicts tag (dog, d:Q144) based on the "subclass of" or "instance of" property of German Shepherd (we are still determining if this is possible for the initial version of depicts functionality). These suggested tags could appear during use of the UploadWizard, or on the image's file page, and be available for a human to confirm their accuracy before being added to the file's data.
  • In the first half of 2019, we expect to launch a machine-based image classification feature that may suggest a number of additional depicts tags including "dog" (d:Q144), "pet" (d:Q39201), "canine" (d:Q2474088), etc. These suggested tags could appear on the image's file page and be available for a human to confirm their accuracy before being added to the file's data.
  • Once a suggested tag is confirmed, it is added as a depict statement and the German Shepherd image will show up as a match for searches for any of those terms.
  • On the file page, users will be free to add additional depicts tags that are accurate for the image (for instance, if it's a young dog, add puppy (d:Q39266) )
This combination of techniques should ultimately result in better searches that can be both very specific (show me German Shepherd puppies) and broad (show me pets).”

Within the next week or two we will provide information on access to try out a prototype for Search on Commons. The prototype will not be advanced enough to show what we are talking about here, but we will be providing more information about "good coverage" tagging at that time. Keegan (WMF) (talk) 17:04, 21 September 2018 (UTC)

1/What do you mean "the user uploading it tagged it with only "German Shepherd" "? during use of the UploadWizard will the user have to chose a tag or a category? or did you want to mean "the user uploading it with only "German Shepherd" as category? Christian Ferrer (talk) 14:28, 22 September 2018 (UTC)
2/may be a language misunderstanding from myself, but do you mean that an image " only tagged with "German Shepherd" " will not appear in "dog" search results, because it is not tagged as "dog"? and that we have to add the "dog" tag manually? Christian Ferrer (talk) 14:37, 22 September 2018 (UTC)
I can appreciate why you're considering this, but (as presented) I think it's a bad idea.
A key principle on both Wikidata and Commons has been to try to make statements as narrow and precise as possible, and to rely on hierarchy rather than permitting redundancy (eg: COM:OVERCAT, here on Commons).
The problem, as many have discovered, is that searching a hierarchy is expensive, far more expensive than a flat tag search. People writing bespoke queries may be prepared to wait 60 seconds for a full hierarchical exploration (and the SPARQL service is able support this relatively small population of searchers). But 60 seconds is not acceptable for the main search interface, nor would the query engine be likely to scale to support full hierarchical searching for the entire population of searchers.
Also there's the issue that the Wikidata ontology at the moment is simply not in good enough shape -- just not consistent and predictable or reliable enough -- to even specify what those hierarchical searches should be.
So going back towards something that can be implemented as a flat search starts to look like the only solution.
But IMO adding multiple redundant "depicts" tags for the same object in a wider image is to be avoided if at all possible. Keegan, you say that there has been a review of this "within the Commons community". I'm aware of a couple of times the question has been raised, eg here and here, admittedly without much take-up, but with a sense I think that this was not the direction the participants would prefer. It adds redundant clutter to the item. It makes it difficult to know whether there are two objects involved, or just a single one. It reduces the impetus to refine the description and try to describe the things really sharply (in my view the COM:OVERCAT principle strongly contributes to the activity of category refinement for images). It makes it less clear where qualifiers (like "shown with features" or "located within image") should be placed. And it goes directly against the principle used on Wikidata, on a system that's supposed to seamlessly combine with it.
As an alternative, I would suggest treating these additional tags added for search purposes as 'shadow tags', attached closely to specific (conventional) primary tags for items. So if something in the image is tagged "German shepherd", make "dog" an alternate shadow tag attached specifically to that "German shepherd" tag, rather than a free-floating tag in its own right.
That way we can keep things organised, preserve the impetus to try to refine the identification of things, and be clear about how many identified things there are -- that there is only one animal in question, not two. Jheald (talk) 20:32, 22 September 2018 (UTC)
A further issue is what will happen when a Commons image "depicts" something with its own Wikidata item. How is it proposed to handle this case? An item on Wikidata will not have redundant depicts values: it will not have an additional "depicts:dog" statement, if it is for a painting of a German shepherd. Jheald (talk)
The "shadow tags" would be a kind of cache and like any cache would easily become out of date if the underlying data is changed on Wikidata. But the alternatives don't seem very pleasant. Queries that take 30 seconds to complete? Tagging every photo of a human with "human", "homo sapiens", "person", "homo", "homininae", "hominidae", "primate", "ape", "animal", "onmivore" "two-legged animal", "organism", "thing",... I know I've missed a lot. --ghouston (talk) 11:12, 23 September 2018 (UTC)
@Ghouston: The team appear to have developed really cold feet about using Wikidata to populate the additional search tags -- see phab:T199119, and in particular the first substantive comment on that ticket, by Cparle on 10 July, for their quick initial read of some of the issues this would face. So I don't think there would be any intention to keep the additional tags sync'd with Wikidata. Instead I think the suggestion is to perhaps try to suggest a few extra tags at upload time, and then from then on let them follow their own separate destiny. (Otherwise your analogy with a cache would be spot on.)
Hence the 'shadow tags' existing in their own right. But I do think there might be mileage in e.g. storing them as a qualifier ("additional search tag" ?) to a depicts statement, rather than as independent depicts statements in their own right. Jheald (talk) 17:13, 23 September 2018 (UTC)
Jheald has accurately described some of the technical issues that prevent us from implementing the preferred approach. The idea of something like an "additional search term" qualifier has some promise, and is an approach we're still considering as a possibility, but we need to game out the consequences involved. There are other logistical issues like how we would display it consistently in the UI, and how we integrate that approach with other platforms/systems (like GLAM databases), and how this would work with search. If that approach turns out to not be feasible, the solution that covers all requirements without extreme workarounds is to simply have a number of depicts tags on the M item. Although some tags might be somewhat redundant to humans (but still useful for search purposes), we can probably mitigate the impact on the UI. We will have the "Make Primary" button/link that will allow users to essentially say "these things are the most important", and those tags would be shown first and be the preferred vehicles for qualifiers. Again, using the German Shepherd example, although the image may be tagged with "dog", "pet", etc., German Shepherd can be the primary tag and house the important qualifiers like "applies to part", "shown with features", etc. while the depicts tag "dog" doesn't need to be primary and can just hang out in the background minding its own business (we're also considering a "cutoff" where, after a certain number of depicts tags, the user will have to expand to see more). We also have other reasons for wanting to separate what we're calling "elemental" depicts tags, including making it easier to import data from sources that already have tags set up that way (like Flickr Commons, GLAM sites, etc). Depicts on Commons will perhaps be the most complex part of the project, and easy answers will be in short supply, but we think the end result will be a dramatic improvement in search and discoverability. RIsler (WMF) (talk) 22:35, 24 September 2018 (UTC)
@RIsler (WMF): Thanks for dropping by. It's good to know that something like an "additional search term" qualifier is still in consideration.
Regarding the use of "Make Primary", I am now a bit confused. I had understood, from the Depicts consultation that 'Primary' was to be used on "depicts" to indicate the overall topic of the image -- eg something like nativity scene (Q31732) or Sacra Conversazione (Q370665), rather than being used to prefer Mary (Q345) over woman (Q467) for one of the elements within the scene. I do think that for the latter a better approach would be to try to tie the two together more concretely, eg by making the one a qualifier value for the other. It would be a much better structure for people writing queries to be able to work out what is going on. The idea of introducing additional ranks beyond the three used on Wikidata is also interesting (but is this possible, technically, without major surgery to the code of wikibase?), eg to hive off secondary tags to a lower rank, so many applications could ignore them. But going down the road, I suspect that tying the secondary tag to the regular tag is probably information that will turn out to be useful. If an additional rank were going to be introduced for anything on CommonsData, I would put one for "inferred by machine; not confirmed" at the head of the queue -- I suspect it is a status we may be going to be seeing a lot -- to rank below a regular statement, but still be eligible to be included as a wdt: statement in the RDF, if there was no regular statement outranking it.
As regards data import, I suspect we're kidding ourselves if we think this is ever going to be easy. I'm working on an image upload project with a major GLAM at the moment, with simultaneous creation of Wikidata items for the underlying objects, and the reconciliation of names for people and places to Wikidata is brutal -- easily the most rate-limiting aspect of the whole process. This is probably as near as one can get at the moment, before CommonsData goes live, to what an upload involving Structured Data will entail. As an example, the current batch of images I've been working on contains 200 creators or contributors, with names that are supposedly normalised to the Library of Congress preferred form, if the LoC has heard of them. An initial match to the LoC and then Wikidata found 90 matches, 10 of which turned out to be wrong. By trying matching via VIAF, and then going through remaining candidates one by one, I've now raised the 'matched' count to 110 of the original 200, but it's taken a day and a half to do. And this batch is just 2% of the overall collection. Perhaps the universe of potential "depicts" tags is a more limited vocabulary, but the matching of a tag vocabulary to Wikidata, and then even more so the verification of that matching, is not a small job. I suspect that against all that, using machine methods to identify when one tag is probably just a less specific intimation of another tag, and should therefore be made subordinate to it, will likely add no more than a drop in the sea.
A further point is that Commons will still be expecting all uploads to be fully categorised, and for those categorisations to obey COM:OVERCAT, ie only categorise with the most specific indications. Structured Data should help a lot with that -- one of the reasons I'm so much trying to go the Wikidata route with my current project is to then be able to read off the appropriate Commons categories -- but to avoid OVERCAT the uploaders will thus need to work out in any case which tags are redundant to which other ones, so the effort of determining this to store them in qualifiers is not really an additional overhead. Jheald (talk) 18:52, 25 September 2018 (UTC)
For "make primary", we're exploring whether it can serve more than one purpose. Yes, its main use would be to identify the main subject of the media. But perhaps this feature (or something similar) could also say, either implicitly or explicitly, that the tag in question should be the one to host relevant qualifiers. Again, this is all still work in progress and we have a lot of different use cases to account for, so we certainly won't have anything solid on this until next month. RIsler (WMF) (talk) 18:03, 26 September 2018 (UTC)
Hope we misunderstood the comment made by Keegan (WMF), otherwise it is likely better to develop FastCCI tool, and to create a "tag" namespace in Commons that will work in parallel with category tree but that will not be subject to our over-categorisation rules. Example : if you categorize your file with Category:Dog then Tag:Canis, Tag:Canis lupus, ect, ect... are automatically added to the file by a BOT or a software, and when you click on Tag:Canis then you see all the images that have "Canis" as tag. This would allow to stop spending a signifiant part of the $3,015,000 USD of that project. Sorry for that last sarcasm. Christian Ferrer (talk) 12:03, 23 September 2018 (UTC)
  • @Christian Ferrer: 1. Refers to statement tagging, not category tagging. Categories remain an independent process 2. Correct, the file would have to be tagged with "dog".
I'll work on getting some more specific answers to other concerns and questions. Keegan (WMF) (talk) 19:05, 24 September 2018 (UTC)
ok thanks for the answer. Christian Ferrer (talk) 21:11, 24 September 2018 (UTC)
  • It seems to me that it is a disaster that the system will not automatically be able to make a search based on a hierarchy of tags. Would it be possible to offer both types of search, i.e. a simple tag search which would be fast and a hierarchical search which would be understood to be slow (perhaps limited in the amount of hierarchy which could be searched)? Strobilomyces (talk) 11:52, 25 September 2018 (UTC)
@Strobilomyces: I can't speak for the team, but as I understand it the sheer number of different ways different properties are used in different circumstances, plus the density of very odd glitches in the WD ontology, plus the difficulty of prioritising results to meet general users' expectations of seeing the results that they would actually want to see, have put the team right off offering any deep hierarchical search. (See the assessment by Cparle on the ticket I linked above for just a taster of some of the problems lurking under the surface). Any attempt in this direction would be a major research project, simply not on the agenda for the team trying to ship version 0.1
BUT --- all of CommonsData and all of Wikidata should be accessible from WDQS, so it should be possible to write queries in SPARQL that are as deep and complicated and bespoke and intricate as one could wish. And probably, soon enough, one will find that users who have a particular knowlege and interest in particular areas, understand the twisty details of the Wikidata hierarchy in those particular subject areas, and are prepared to put in the time to extend some of the data that is incomplete and fix some of the statements are wrong -- those users are quite likely to start producing ready-written query designs for particular subjects and disciplines, that somebody might well graft a user-friendly front-end onto. But nobody should underestimate the amount of data that is going to need to be improved on Wikidata, if those queries are going to produce good and solid results -- just look at all the data that is currently missing from the infoboxes on categories, just for starters, never mind all the data that is still needed to make sure the hierarchies behind those items are solid and robust. Jheald (talk) 17:20, 25 September 2018 (UTC)
Thanks for the answer.Strobilomyces (talk) 11:44, 26 September 2018 (UTC)
  • I have some doubts about this. From my experience with the Wikdiata ontology I have to admit that it might not be well suited for Commons because it is deeper than what Commons needs, and perhaps not as user-oriented as one would expect. The thing is, there is nothing stopping Commons users to create their own ontology or hierarchy of depicts items. So why not have an own collection of depict items on Commons itself and structure them as wished? Then they can be connected to Wikidata items where appropriate, and use whatever ontology the user wants.--Micru (talk) 07:51, 29 September 2018 (UTC)
@Micru: CommonsData is not currently projected to support generic items, only media-items for particular media files. Generic items are expected to live on Wikidata (per current plans, at least). Jheald (talk) 11:29, 29 September 2018 (UTC)
The question which was not studied is what should be done in wikidata ontology to allow a correct search using the wikidata ontology. Currently nobody try to improve the wikidata ontology because there was no reason to have a strict set of rules. But we can improve the ontology by fixing a set of simple rules like an item should not be an instance and a subclass at the same time or no reference cycle. Snipre (talk) 07:19, 2 October 2018 (UTC)
@Snipre: The comment by Smalyshev on wikidata-l is also worth reading [1] : The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing... One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance... I think this is very much something that the community can do. Jheald (talk) 08:11, 2 October 2018 (UTC)

@Keegan (WMF): If I understand correctly: The current wikidata ontology is unsuitable for searching (e.g. related discussion) which is a huge problem. I do not think it is a good idea to cover up this mess with hundreds of different tags. Instead the image classification and searching algorithm should be a motivation and help people to fix the ontology. --Debenben (talk) 15:59, 3 October 2018 (UTC)

  • @Keegan (WMF): I fully agree with above, if "German Shepherd" is currently no linked (in the results of a potential search) with the taxon chain of Canis lupus familiaris, it is because the ontology is not well done, Structured data for Commons may be a good idea only in the extend that the "data" is indeed well structured. In Wikidata German Shepherd should be a "breed" (with "breed" as a property) of Canis lupus familiaris, however it is not. It is currently a breed of dog, which literally is true but ontologically totally wrong, "dog" is not a species but a taxon common name. I wonder how many items are affected by this kind confusion. As well woman (Q467) is a "female adult human" only in the description, but not in the statements, where you can indeed find "female" and "adult" but not "human", therefore women will never be highlighed if you search "female mammals". But that's not why I pinged you, has it been envisaged to have the possibility to add qualifiers to the depicts "tags", as it is shown for the Search prototype? That will be good. Sorry if it is already written somewhere and if I missed that. Christian Ferrer (talk) 05:24, 7 October 2018 (UTC)

Necessary changes to how viewing and using old file page revisions functions[edit]


Structuring data on Commons with Wikibase changes how content on a file page is stored and served back, known as ‘’multi-content revisions’’(MCR). Instead of a file page revision being a single big chunk of information, data is broken apart into pieces known as “slots.” When you view a file page, its history, or any individual revision, what you are seeing is being assembled from multiple slots.

This makes serving old revisions of a file page complicated, as one slot may have a revision that has been edited while another slot has not been changed. The old version of a file page cannot be returned in the same way that a plain wikitext-based wiki page works, which simply finds the specific past version of the wikitext on the file page – because there is only one – and returns that.

In order to make MCR work on old revisions of file pages, the development team is looking at making these old versions of pages match how Wikidata functions. The following things change when looking at an old revision of a file page:

  • The Edit tab at the top right of the page is replaced with Restore
  • The function of the Edit tab, accessing the old version of the entire wikitext of a page in order to be restored is removed. Instead, a page is shown with the differences between the current and old revision (the one being restored), with an edit summary field.

Let’s say that you want to revert a file page to a specific version from the past. Currently, you’d access the History, click on the revision that you want. From there you would click on the Edit tab, view the old text in the editable text box and fill in an edit summary, and save the page.

The new function has you access the history, click on the revision that you want. From there you would click on the Restore tab (which has replaced the Edit tab). You’d then see a diff of the revision from the current page, and an edit summary for to fill in with the save button. The editable text field is removed. This is replicating how Wikidata handles old revisions.

If you’d like to read through the technical discussion that resulted in this decision, here is the Phabricator ticket where you’d start. There are more links within, including links to gerrit patches.

There are advantages to serving old revisions in this new manner, the main one being simplifying the process of restoring an old revision should that be your goal in editing the page. There are some drawbacks to this decision however, primarily that the entire wikitext of old page revisions will not be available for copying, if someone is looking to duplicate old text on another page. Individual line changes can be seen and copied from the diff view. As mentioned at the top, this change will only affect the File namespace on Commons. Access to old revisions in the Commons namespace, Template namespace, Main namespace, etc., will remain as it is today. This use of old revisions in the File namespace does not seem to have a large impact on Commons, and the team hopes that any disruption in workflows from change in how old text is accessed is minimal. The team may try to look into other ways of serving the entirety of old wikitext page revisions, but it will not be possible in the near future.

Are there any questions about this change? Keegan (WMF) (talk) 19:37, 27 September 2018 (UTC)

So it's not going to be possible to undo just a change to the wikitext, without reverting back the structured data -- nor to just revert structured data, without reverting the wikitext?
This might be a problem, if we consider that it might often be two largely distinct communities editing the data (probably heavily mechanised) and editing the wikitext (probably manually), often likely quite largely independently.
If somebody reverts back some edits to the data after a mistake, while in the intervening time an edit has been made to the wikitext, it sounds as if that wikitext edit will be reverted back too, and may be quite hard to reinstate, if it is no longer possible to access the wikitext of a whole page in the form to which it had been updated. This might upset non data-editors quite a lot. Jheald (talk) 20:35, 27 September 2018 (UTC)
Undoing and reverting will work just fine. Here's what you won't be able to do directly anymore on a File page: open the old revision, edit that revision directly as wikitext in the editing box with the big warning that this is an old revision, and save it as the new revision. Keegan (WMF) (talk) 22:17, 27 September 2018 (UTC)
I'd like to point out that this use-case for an old revision of a file page, accessing the old wikitext directly to either copy or manipulate it to save as the current revision, does not seem to be a common workflow for file namespace here on Commons. It is quite common in discussion spaces, and on other wikis. Please let us know if there is a prevelent use-case for this workflow that we need to figure out a solution to. Keegan (WMF) (talk) 22:21, 27 September 2018 (UTC)
To add on to what Keegan said above - A.) The MCR team is still working on features and, in the future, should have a way to access the Wikitext of old revisions. It's just probably not going to be ready for our v1 launch. B.) As we get closer to launch and start putting things on Beta for testing, we'll explore a few possible temporary workarounds to address some edge cases as Jheald mentioned. RIsler (WMF) (talk) 01:51, 28 September 2018 (UTC)
Will this affect API access to old revisions? Currently, Geograph Update Bot makes some use of the ability to read past revisions (to check if location templates are the same as in the first revision), and I'd like it to make more (to detect location templates added by other bots, and to detect when it's thinking about reverting another user). It would be unfortunate if these became impossible. --bjh21 (talk) 21:20, 27 September 2018 (UTC)
To me, this seems like one more argument for the serialization/deserialization approach I've suggested several times. - Jmabel ! talk 23:18, 27 September 2018 (UTC)
This is just for older versions, right? The wikitext of the current version will still be accessible? (At least, that part of it that isn't the structured data.) BTW, it does seem to be possible to get hold of old versions of Wikidata items using getOldVersion() in pywikibot, but not to format it into an item_dict using get() (you have to manipulate the json - e.g. see [3]), I guess the same might be possible here so that bot-actions (for spotting/reverting vandalism and bot errors) would still work if needed? Thanks. Mike Peel (talk) 00:17, 28 September 2018 (UTC)
Wikitext of the current version will still be available. RIsler (WMF) (talk) 01:52, 28 September 2018 (UTC)
BTW, I'd send around the alert about this discussion as you did for #Property creation on Wikidata - this is a more important discussion than that one was... Thanks. Mike Peel (talk) 00:18, 28 September 2018 (UTC)
I'm doing so first thing in the morning. It was a very busy day, and experience has taught me to not send a massmessage at the end of a busy day :) Keegan (WMF) (talk) 02:10, 28 September 2018 (UTC)
I did this a little while ago. Keegan (WMF) (talk) 16:07, 28 September 2018 (UTC)
Pictogram-voting-question.svg Question How will work Commons:Rollback? Christian Ferrer (talk) 16:34, 28 September 2018 (UTC)
Pictogram-voting-question.svg Question how will be displayed file histories and how will be displayed Difference between... Christian Ferrer (talk) 17:12, 28 September 2018 (UTC)
None of these things should be affected or change. What is changing is how old revisions work in relation to viewing an old version of a page. Keegan (WMF) (talk) 17:34, 28 September 2018 (UTC)
Ok thanks you for the answer. Christian Ferrer (talk) 17:55, 28 September 2018 (UTC)
Losing access to old revision's plain wikitext is plain red flag IMO. It may not be used that often but that's really helpful when you need it. Wikidata - you never see the plain wikitext (and you never need it), and here on Commons, we work with plain wikitext. — regards, Revi 16:43, 28 September 2018 (UTC)
@-revi: How often do you need the bulk plain wikitext from an old revision of a file page? I ask because as @RIsler (WMF): mentions above, the team should have a feature to access the old revision wikitext again in the future and this removal is temporary (unfortunately we do not know how long temporary is). If you find that you do access these old revisions on file pages regularly as part of a workflow, we'd love to hear about it and pursue a workaround. Keegan (WMF) (talk) 17:03, 28 September 2018 (UTC)
it is a commons workflow for me, to edit text to change information template to artwork template (for example) or copy paste LOC metadata into an old LOC flickr upload, i even use visualfilechange to make mass text changes to old files. but i would be happy to edit wikidata instead. a pencil on the template, to go to wikidata would be expected. and on ramp to QuickStatements. Slowking4 § Sander.v.Ginkel's revenge 02:51, 29 September 2018 (UTC)
  • Agreeing with @-revi: This is a disgrace. How often do we use it, @Keegan (WMF):? Often enough for you not break it, what about that? -- Tuválkin 02:57, 29 September 2018 (UTC)
    • I'm trying to make sure I understand: will it still be possible to edit the latest version in a straight-text manner? That is, when you are talking about "old" revisions not being editable this way, is that just ones that have already been changed, or does that include the latest? Because if it's the latter then, yes, this is going to break a lot of workflows. - Jmabel ! talk 03:34, 29 September 2018 (UTC)
      • I'm still waiting for an answer to this, as conversation has headed off in different directions. - Jmabel ! talk 17:13, 29 September 2018 (UTC)
        • @Jmabel: The latest revision of the page will be editable just as it is today. We are only talking about when you view a historical version of the page. Keegan (WMF) (talk) 18:34, 1 October 2018 (UTC)
I have three concerns:
  1. In the two examples above the one from Commons allows you to preview the page, but the wikidata version does not. Does that mean we will also lose the ability to see what the page will look like while in the process of performing a restore? Fortunately, I assume as a workaround we could instead take care to always start by clicking on an old revision to see a rendered version of the the old version before clicking on the "restore" command.
  2. A common use I have for examining the old wikitext for a page is to figure out what wiki code was used in the past to produce a complex layout of description and licensing templates that have since been changed. Possible workarounds would be to either (1) restore an old revision, copy the old code, restore again to revert that restore, or (2) copy the current wikitext and then manually apply the diffs backwards through each revision to reconstruct the old code. Neither is particularly appealing. I would definitely sorely miss the ability the directly copy the wikitext of old revisions in the File namespace. I could live without it temporarily, but this is not something I do infrequently.
  3. Another case where I use the wikitext of an old revision in the File namespace is when making corrections to fix an edit that has broken the template rendering of a page. Commons File pages make heavy use of templates for their content and are often edited by editors from other wikis who make use of Commons but are not themselves primary contributors to Commons, so they are less familiar with the complex set of templates that Commons has built to produce the majority of the content in the File namespace. It is not uncommon for inexperienced editors to make an edit to a file page that adds useful information but also breaks the page rendering in a significant manner. For these types of corrections I will often click to start editing the revision of the page immediately before the less experienced editor started editing the page, copy out the wikicode for the portions that they inadvertly disrupted, and then use this copied code to make a correction. Alternatively, I might start by editing the revision of the page before the inexperienced editor started, actually make the change the other editor was trying to make and would have made if they were more experienced with Commons, and then submit. Making these types of corrections will be much harder without access to the wikitext of old revisions in the File namespace. As a workaround we will instead have to restore an old revision, copy the code, restore again to revert that restore, start a new revision, paste the copied code, make the necessary corrections, and then submit.
RP88 (talk) 04:06, 29 September 2018 (UTC)
For the second point RP88, if the answer made by Keegan (WMF) to my question above is right, then you should be able to copy a wikitext with Difference between.... Christian Ferrer (talk) 06:04, 29 September 2018 (UTC)
@Christian Ferrer: In your example link try to use your browser to copy to the clipboard just the contents of the older {{Information}} template and its parameters on the left side of the comparison. In the browsers that I've tried (Chrome, Firefox, Safari) the copied text will be intermingled between the old and new {{Information}} template/parameters. Using a diff as a way to retrieve the text of an old revision can be done, and is usually not too onerous for less complex edits, but quickly becomes impractical. —RP88 (talk) 06:20, 29 September 2018 (UTC)
  • Pictogram voting comment.svg Comment I don’t believe I ever needed to edit the wikitext of old revisions of File pages, and can’t think of any use case I would have needed that ability for. :) Jean-Fred (talk) 08:20, 29 September 2018 (UTC)
  • Pictogram voting comment.svg Comment MediaWiki already supports "view source" of a page, for example as offered when a page is locked. If it is necessary to withdraw the "edit" option link from the page history (and I don't fully understand why that is so necessary), would it not be possible to offer "view/view source" instead ? Jheald (talk) 11:35, 29 September 2018 (UTC)
  • As I understand it, the tricky part with viewing the old source of a page that has MCR is that not all revisions of all things are living in the same place, so assembling that snapshot in the raw is what becomes infeasible and why the view is changed from plain wikitext to the diff view. I think it might be possible in the future to put back together, but for now we need a workaround. Keegan (WMF) (talk) 18:41, 1 October 2018 (UTC)
@Keegan (WMF): That seems rather odd. One would expect at least all the wikitext to be living in the one place. Jheald (talk) 19:15, 1 October 2018 (UTC)
Let me try to clarify what Keegan meant above. It's hard to explain without getting in the weeds about what MCR does, so let me provide a short answer - we might indeed provide a view source button/tab, but it may be easier to simply provide a source view via a modified querystring on the EditPage. The ultimate point is that there *will* be a way to view the Wikitext of a past revision, we just haven't settled on the best way to do that yet. RIsler (WMF) (talk) 21:04, 1 October 2018 (UTC)

Multiple questions regarding the change @RIsler (WMF), Keegan (WMF):

  • Who is going to merge old filedescripon pages in the new system`?
  • Why has no community consenus seeked, on Commons:Village pump/Proposals?
  • Who is going to fix all the bots which will break once the change is merged?
  • If i remember corrently, somehwhere staff promised that filedescription pages and categorys will be keept. Why has this changed?
  • Who is going to fix all the gadgets which will break?

Best. --Steinsplitter (talk) 06:32, 12 October 2018 (UTC)

  • I left a note on COM:AN, so we can get a bit more input here. Best --Steinsplitter (talk) 06:40, 12 October 2018 (UTC)
    • Steinsplitter, why did you leave a note on AN -- this does not require administrator intervention. The Village Pump seems more appropriate (and it doesn't seem like a proposal, more like a FYI). -- Colin (talk) 06:55, 12 October 2018 (UTC)
      • Sounds reasonable, moved it to VP. --Steinsplitter (talk) 06:56, 12 October 2018 (UTC)
        • I left a VP note when this was posted. Keegan (WMF) (talk) 17:13, 12 October 2018 (UTC)
  • Pictogram voting comment.svg CommentI don't think this will be any problem for me. I often want to see the wikitext of current revisions, in order to copy/paste to another page (which is what I suspect some of the hasty opposers above are doing). But I've never needed to do that for old revisions. Indeed the only time I've ever needed the old revision of a File page on Commons to revert to it. As an aside, I wish one could revert to an old version of a file without that appearing in one's upload log -- if the devs know of a ticket for that one, I'd support it. -- Colin (talk) 06:55, 12 October 2018 (UTC)
  • @Steinsplitter: File description pages and categories are being kept. Page history merges will not change. As for why consensus wasn't sought, it's because this isn't an optional feature, it's a required function. I'm not aware of gadgets, bots or tools that this particular change might break (and I had a look). Are there any particular ones that you had in mind? Keegan (WMF) (talk) 17:12, 12 October 2018 (UTC)
  • Required by whom, or by what? -- Tuválkin 18:22, 12 October 2018 (UTC)
  • Multi-content revisions, the software that assembles pages from Commons and Wikidata. Keegan (WMF) (talk) 21:42, 12 October 2018 (UTC)
  • I run User:Geograph Update Bot, which inspects old revisions of pages (and of files). I asked above if API access would be affected, but I haven't yet had a reply. --bjh21 (talk) 19:12, 12 October 2018 (UTC)
  • This particular change we're discussing is more about reverting old revisions via the UI. We have no current plans to change API access to *read* Wikitext for old revisions (new MCR stuff will be backwards compatible). If an issue comes up that requires such a change to be made, we'll be sure to inform everyone before it happens, but as of now the plan is to keep the basic API functionality working as is. More preliminary info here: RIsler (WMF) (talk) 23:49, 12 October 2018 (UTC)

This discussion is 3,300 words long, having run for just over 2 weeks. This discussion was mentioned on the VP, but there was no other effort to notify users, even those of us that put our names forward to be part of formal consultation. It's only happenstance that I remembered the VP mention, which happened while I was away travelling.

The change is significant in the fundamental way that Wikimedia Commons works and should be run as a proposal or RFC, run for at least 30 days, and benefit by having a FAQ based on the questions raised so far.

It's worth pointing out that as the most active current uploader of images on Commons and this change is worrying due to the future potential impact on the way that upload projects will work, templates can be used and running housekeeping tasks on uploads, which includes automatically reviewing past versions of image page text (many of my bot tasks do this as part of checking past bot actions and ensuring bots do not edit-war with "human" changes). Despite vague assurances that this probably will not be any more volunteer effort, I do not believe that will be the case long term. This change is part of making templates harder to use and image page text becoming harder for newbies to format "correctly", with "correct" being defined by whatever pops out at the end of the WMF's structured data project. The authority for the changes comes from the WMF funded project, not because Wikimedia Commons volunteers have established a consensus for changes. Instead the structured data project has fudged consensus by having consultations, like this one, that procedurally mean very little and where input from volunteers can be cherry picked by the unelected, to demonstrate whichever case benefits the project at that time.

Thanks -- (talk) 10:38, 16 October 2018 (UTC)

Lua version of the {{Information}} template[edit]

In preparation for the Structured data on Commons it might be a good idea to revamp our most used infobox template: {{Information}} and rewrite it in Lua. I can look into adopting and simplifying some of the code used to {{Artwork}} template to develop Module:Information which should be a very simple and lightweight replacement of the current wikicode. Of course once the sandbox version is ready we would notify the community and go through extensive testing process before any deployment. The Lua code at this phase would simply mimic output and behavior of the wikitext but in the future might be used for merging data stored in the information template with data stored in structured data. --Jarekt (talk) 17:21, 22 October 2018 (UTC)