Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

The future of file names[edit]

Note: Commons has a long-standing policy on how to properly rename files, and a long-proposed policy on how to name files. I'm assuming in writing this that the policy may no longer be as relevant to the community once Commons is structured. Files will be much easier to locate regardless of the file name. If the community would like to continue its naming policy, or otherwise decline this proposal, that is an acceptable outcome.

It would be beneficial to Commons if file names were replaced with standardized names. The topic came up recently, and it's worth talking about as part of Structured Data on Commons. A new naming system could be put into place to resolve the many issues that exist because of the current way of doing things. What the new names would look like is unknown; it could be the SHA-1 hash all files already have, it could be a newly generated hash of some sort, or any other kind of numbering system. No matter the choice for the new file names, old file names could be kept. There are various ways to migrate and/or grandfather in file names and how they are handled on wikis, etc.

Replacing the current file names with standard file names would have the benefit of removing some problems with file names:

  1. Removes the potentially complicated step of naming a file when uploading.
    • Benefits individuals uploading a single or few images would have one less decision point in the process, a decision point that is potentially complex, which might otherwise prevent them from uploading.
    • Benefits mass-uploading individuals and GLAM institutions often have to rename hundreds to thousands of files as part of their uploading process
  2. Removes the step of having to rename files.
    • Benefits the administrators and file movers that have to do the work of changing file names.
    • Benefits those who spend time listing and discussing files for renaming, freeing up time for other things.
  3. Removes complexity from code - Commons gadgets and tool developers along with MediaWiki developers would no longer have to consider all the edge cases that file names currently have. Two examples of situations that could have been prevented, there are many more:
    • Part of the problem with Wikipedia Zero piracy was the ability for anyone to access some deleted files by accessing the file URL directly. This turned out to be caused by an inconsistency in how the image cache transcoded parenthesis () in file names. This would have been prevented.
    • Sometimes the iOS application couldn't display some pages. It turns out that there were restrictions to the size of a URL in the application that broke because of long file names. This could have been prevented.

The issue at hand is first figuring out if Commons wants this to begin with, and then gathering consensus around how to do it. Replacing file names would require a global Request for Comment to be hosted here or on Meta. It would also require extensive work in gathering translations, making sure all the wikis are involved since they rely on Commons file names in editing, preparations to answer the extensive number of questions that can and will come up, the potential for misunderstandings to be managed, among other things. In other words, this will require a lot of work, with a high potential for rejection even if it is a good idea. But if it's accepted, it would be great for Commons.

What do you think about the idea? Is it important enough to take to the broader Commons and Wikimedia community? Keegan (WMF) (talk) 17:02, 5 April 2018 (UTC)

  • Why to have both a file name and a file ID, is it not more simple to have only an auto-generated file ID? Christian Ferrer (talk) 18:47, 5 April 2018 (UTC)
  • "Benefits those who spend time listing and discussing files for renaming" seems bogus to me. The same matters would presumably still be at issue, they just won't be in the form of filenames. - Jmabel ! talk 21:07, 5 April 2018 (UTC)
  • I think there still needs to be some reasonably mnemonic way to refer to an image. Little could be less mnemonic and more error-prone when typed by a human than an SHA-1 hash. - Jmabel ! talk 21:10, 5 April 2018 (UTC)
  • Won't this make things much more difficult for third-party search engines? - Jmabel ! talk 21:10, 5 April 2018 (UTC)

I could not care less but about third party search engines, but I care about the usage in the Wikiverse, and not having proper names for the pictures, let alone unfathomable cryptic gibberish, to include a good pic in a good article is, to put it mildly, sub-optimal. That's the main use-case the Wikiverse should care about, the other stuff is just nice to have. Grüße vom Sänger ♫ (talk) 21:25, 5 April 2018 (UTC)

  • Absolutely no way in hell. Good filenames make a huge difference to the information available in a category view.
Consider for example, Category:Images released by British Library Images Online, March 2014‎, where the filenames immediately give what the item is, where it came from, when it was created, where in the British Library's holdings it comes from. Or the work that has gone into curating the filenames in a huge number of categories like Category:Collection de Costumes Suisses (1822) by REINHARDT -- so that at a glance, one can pick up exactly what the image is, where it has come from, and when it was made. I'm currently preparing a big upload of maps from 19th-century books, see eg: Commons:British Library/MC maps batch 06 (GB towns and cities). Something I see as a key part of the process is to try to make sure the files are going to have meaningful names. As User:Sänger notes above, this is immensely valuable when using the images, and editing them in wiki pages, as well eg for users who download all the files in a category. But beyond that, it is fundamental to how we present files here in categories.
As for the structured data project, it is an interesting experiment. But it is vapourware. The challenges it faces are enormous, and it may never work. There is not even a proof of concept of the search, not even the slightest back-of-an-envelope sketch yet of plausible achievability -- eg how to return "Picture of a man in a hat", when the man won't be tagged as 'man' (or even Q5) and the hat won't be tagged as 'hat'; when even to produce a list of Q5s that are male currently times out for a single-user query on WDQS, before we even start to think of joining it with anything else, never mind how to scale up to a system that has to be ready for mass-usage, and produce results that are almost instant. And that's just the search. Schemes for populating the system with detailed descriptive data for 40 million files simply don't exist either -- it's pure vapour. (And something that a number of smaller, simpler schemes like ArtUK have notably failed to crowdsource to any consistency of coverage). So: no, don't expect us to even think of doing anything that might sabotage a key plank of Commons, until Structured Data is an absolutely solid proven reality, that is proven to work in a full-scale production environment, with fully loaded fully detailed data, under full real-world demand load.
Rather than this half-baked crap, as Commons community liasons why not focus on how to get the project to achieve something that the community has asked for, and repeatedly: namely CommonsData items for Commons categories? This should be a quick proof of concept and quick win for the federated CommonsData system, showing that it can run at production scale, that would let us the community immediately get on with some useful work, namely identifying what the categories represent, and recording it in accessible, queryable form using structured data, which isn't currently possible, since most Commons categories don't (and won't) have Wikidata items. Jheald (talk) 22:49, 5 April 2018 (UTC)
  • @Keegan (WMF): April’s Fool was last week. -- Tuválkin 23:21, 5 April 2018 (UTC)
     :) Keegan (WMF) (talk) 22:28, 9 April 2018 (UTC)
  • The idea of getting rid of user-specified filenames gives a lot more motivation to the captions aspect of Multilingual Captions. The captions would replace filenames, and would have the advantage of being translatable. I don't think it's a bad idea. You'd have a persistent ID for a file which could be used for all external links, and won't break just because somebody wants to change the caption / title. I think abandoning the idea of "uploading new versions" of a file would also be part of it: overwriting files has always been problematic. Maybe you could instead have some kind of "clone file with new version" feature that would save filling out all the details. I'd have thought the database would already have an identifier for each file, maybe a sequence number or something, which could be used as the "file name". --ghouston (talk) 00:01, 6 April 2018 (UTC)
    • FWIW, I routinely use "uploading new versions" as a way to first upload the photo as it came from my camera, then upload a post-processed version. Means that no one will "too easily" use my rawer version in a Wikipedia article, but it's there if someone wants it for a different post-process. I hope that if we get rid of "uploading new versions," there will be another way to support this use case. Maybe some notion of a "draft only" version?
      • Or a way to indicate that a different file is the preferred version: it would also apply to files with errors. --ghouston (talk) 00:36, 6 April 2018 (UTC)
    • Also, we'd need to think about some other way to reference charts or graphs that deliberately change over time. Maybe via some sort of redirect, where we would change the target? - Jmabel ! talk 00:21, 6 April 2018 (UTC)
      • Yeah, I was just thinking about that problem, some sort of "virtual file" which redirects would seem to be needed. There's also the problem of files which have been widely linked throughout Wikimedia projects but turn out to have an error that should be fixed. Maybe that could be handled with a "global replace" tool available to administrators / and perhaps unemployed file movers. --ghouston (talk) 00:27, 6 April 2018 (UTC)
  • An integer sequence number, like the item identifiers on Wikidata, would probably look better for linking than a 20-byte SHA-1 hash. --ghouston (talk) 01:00, 6 April 2018 (UTC)
    • Almost as miserable as an SHA-1 hash for something that has to be handled by a human. One of the many reasons it's a problem: a 1-character typing mistake will almost always result in something still meaningful, but not what you intended. At the very least, if we are going to use non-mnemonic names, there should be some sort of checksum to make them single-error tolerant. - Jmabel ! talk 01:33, 6 April 2018 (UTC)
      • An 8 or 9 digit number would be shorter than most filenames (I guess), and much shorter than some of them. Hopefully people would notice if they copied the wrong number and got the wrong file. --ghouston (talk) 02:45, 6 April 2018 (UTC)
      • Just looking in the recent changes, I spotted File:Joseph McKenna, Associate Justice, Supreme Court, full-length portrait, seated, facing right LCCN97502836.tif: file names already have cryptic components that can be longer than 8-9 characters. --ghouston (talk) 02:48, 6 April 2018 (UTC)
        • Right, a name like that is clumsy, but it's also highly redundant: it is very unlikely that if you change, add, or remove a single character you will get another meaningful (and unrelated!) filename. - Jmabel ! talk 05:38, 6 April 2018 (UTC)
We have the full file name ("breadcrumb") and a short file name (pages, media files, etc.). In the short file name, we, of course, can not include all the information. But simply to get it based on the full file name. By the symbols used in the short file name: the same evolution will occur here as with the names (from symbols) of Wikipedia articles (and also according to the law of the transition of quantity to quality), when instead of page names in a variety of different languages, the human-readable Wikipedia will be named in one universal, unified (WD?) language. Ie, instead of names on enWP, or on frWP, or deWP, or beWP, or etc., one unique name Q* on WD. Has WD problems with translating WD-language into any user's language? No, it does not: everyone gets the name of the (URL) object in their own language. And there are no problems with renaming (only one question is solved for the item: to beexist or not to beexist). --Fractaler (talk) 07:35, 6 April 2018 (UTC)


Several points why this is a bad idea:

  • If I see an edit in an article about John Doe changing File:John_Doe_on_public_meeting_in_Whereverville.jpg to File:Erected_penis.jpg I know this is a vandalism immediately, without even the need to wait until it loads. If I see File:5FD4F1E4353745E3A63592A0637EADBE6787A4E4EB0CA77E7A0818878F366B81.jpg changed to File:97FB251BA0783BFA668E6496BB4D8B69F63B1E264692BA8134168113D48F3BDF.jpg I have no idea what has happened. This involves not only apparent vandalism (let's assume some properties on Structured Commons might give some tips in lieu of current ones in the future), file names often give a cue that the file comes from a professional studio or is taken from the official web-site, so we can spot possible copyvio. If someone adds to the Kyiv article File:Independence_square_Kyiv.jpg I know this is a noFoP violation I should go RfD. If I see File:028BAA2538B3CFDE59E71C918400E8D3CFA31F222EDA051F4F2980B4F2B21DFD.jpg added I have no idea what it is and have to check. Also I would not be able to easily locate the file on page because I can hover my mouse over the files and see which one is Independence_square_Kyiv.jpg, but I would not easily notice which one was File:028BAA2538B3CFDE59E71C918400E8D3CFA31F222EDA051F4F2980B4F2B21DFD.jpg and which one is File:5DC3ACC0C06D462F8880C83A7A1A7B017BD66BA596C4AF53AD0299C4AB9A152C.jpg.
  • Reuse in printed media. If I use a file in a printed leaflet, or a book or anything else I provide attribution. A reader then can follow that https://commons.wikimedia.org/wiki/File:Вася_Пупкин.jpg and see the file. Of course for a reader not knowing Cyrillic this would be about as tricky as for me would be https://commons.wikimedia.org/wiki/File:熊貓在樹上.jpg arguably https://commons.wikimedia.org/wiki/File:AD85993F7BD2323FADBBD36BD2EDE03F644082DF39B26A5626062E095F817410.jpg would actually be simpler to type, but generally common sense applies and as a person involved in creation of the printed material I can gauge how likely my audience is to know how to type in particular alphabet and if they are not I can use percent encoding or link by curid instead. Filenames are supposed to be self-explanatory so this also reduces the need to provide an additional text which is sometimes crucial (e.g. logo attribution on a small leaflet). I understand that we still would have queryable file descriptions or something like that, but those might have a legal implication: filenames are unique, so the attribution is satisfied with it. If I end up getting 20 files of panda on a tree by the descriptor printed then it is no proper attribution.
  • UX when editing articles. In VE and on WD we can show description of the image alright. But source editing is not going to go anywhere and I would really rather edit a table of let's say this while seeing the names, so that I know that they correspond to correct items rather than see a list of non-human readable identifiers. And as mentioned this will also remove possibility to check which file it is by hovering and checking statusbar. This is different than CVN-ish first issue because the first one is about metapedian work of patrolling, checking whatchlisted edits and so forth, while this one is about general feeling when editing articles (and let's remember people enjoying writing articles must be our main priority).

I think the only improvement would be to allow to use files externally by their desc page id rather than name externally, just like they can be linked to by it. This would allow embedding them in third party sites the way that ensures that our file renames do not affect them and also possibly bypassing some limitations as some platforms do not allow too long URLs as might be the case with some files and so forth. P.S. I realise that if rather than SHA256 as in my examples something else is used just like WD identifiers or curids some stuff would feel less bad, but still I'd rather not look at 68088814 instead of Schloss Herrenchiemsee LOC ppmsca.52570 unless I have too. --Base (talk) 17:29, 6 April 2018 (UTC)

  • Is linking to images on Flickr difficult, given that they use numeric ids? It doesn't seem that way to me, if anything it's easier than arbitrary Unicode filenames. Although on Flickr, it seems you need to combine the user and image ids in the URL. --ghouston (talk) 23:45, 6 April 2018 (UTC)
  • @Keegan (WMF): Please don't. Structured data people promised to the community to keep the normal functions (Category's, etc.). Needless to say that the file rename policy has been approved by the community and changing it requires a formal RFC. :) I know that filenames and x-wiki transclusion is problematic, but we have to find a other solution to keep filenames. --Steinsplitter (talk) 17:36, 6 April 2018 (UTC)
Analogs will say more about the benefits for the end user (ie, for those who use Commons, rather than editing it) the conversion of a variable ("name", "імя", "নাম", "नाउँ", "Όνομασία", "नाम", "名前", "სახელი", "名称", etc) to a constant (something like "Q82799" or "5FD4F1E4353745E3A63592A0637EADBE6787A4E4EB0CA77E7A0818878F366B82"). Analogue 1: DNS-name/address (shortcut, variable) and IP-name/address (constant). IP will be permanent, and DNS can be changed without affecting the IP references.--Fractaler (talk) 17:38, 7 April 2018 (UTC)

Thanks for the feedback, folks. It looks like this won't be a good idea to move forward with. People are welcome to continue the discussion, I'll keep an eye on it, but I'll pretty much consider the matter closed. Keegan (WMF) (talk) 22:28, 9 April 2018 (UTC)

Autogenerated summaries of files in the user's language could be useful, especially where the file name is in a language the user does not understand (or whose writing system they cannot even decipher in any way). For example, this file could get the autogenerated string lighthouse + solar panel, Ireland (22 September 2014, Lydia Pintscher) in English.
Such summaries would not be proper file names, they would not have to be unique and could change over time, but they could conceivably replace the file name in some situations where information is displayed to the user. For example in an enhanced view comparing different versions of a page (where users could set autogenerated summaries as the default for what is shown to them as the file identity) or in long lists of files. --Njardarlogar (talk) 08:59, 5 May 2018 (UTC)

Multilingual captions prototype testing[edit]

Spoiler before you get your password
Result

The good news: there is an early working protoype ready for some feedback.

The bad news: it's kind of hard to get to, so I'll explain that part.

There is not a solid testing ground for new software for Commons that copies the "production" (live) environment that is here, for many complicated reasons. There is beta.commons and test wikis, but they are unstable and are not reliable when reporting and reproducing bugs. In order to build and test software for Structured Data on Commons, the team has created a special instance of the wiki on Wikimedia Labs, https://federated-commons.wmflabs.org/ . Since this is a small testing wiki without a volunteer community to patrol it, and since testing for Commons often involves uploading images, the wiki is private with account creation turned off. I know and the team knows that this is not ideal, and they're working towards more open solutions as more software is developed, but this is what we have at the moment.

Following all that, back to the good news: I have six accounts that can be used to test captions on the labs wiki. I figure if you are interested you can put your name down here. I can email you the username and password with a link to the wiki and UploadWizard there. Try it out, come back and leave your feedback here and let us know you're done, and if needed the name and password can then be sent on to someone else. Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

I'd like to test now
  1. Christian Ferrer (talk) 17:07, 24 April 2018 (UTC) ✓  Done Feedback complete. - Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
  2. Steinsplitter (talk) 17:14, 24 April 2018 (UTC) ✓  Done Feedback complete. Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
  3. Yann (talk) 17:14, 24 April 2018 (UTC) ✓  Done Feedback complete Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
  4. Raymond 17:32, 24 April 2018 (UTC) ✓  Done Feedback complete Keegan (WMF) (talk) 17:45, 24 April 2018 (UTC)
  5. Jarekt (talk) 18:25, 24 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
  6. Juandev (talk) 18:37, 24 April 2018 (UTC) ✓  Done Feedback complete. Keegan (WMF) (talk) 18:52, 24 April 2018 (UTC)
  7. D Y O L F 77[Talk] 20:56, 24 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 22:54, 24 April 2018 (UTC)
  8. --Sannita - not just another it.wiki sysop 14:35, 25 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
  9. Syced (talk) 08:11, 25 April 2018 (UTC) {✓  Done Feedback complete. Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
  10. John Samuel (talk) 14:06, 25 April 2018 (UTC) ✓  Done Feedback complete Keegan (WMF) (talk) 20:07, 26 April 2018 (UTC)
  11. Sandipan Banerjee (talk) 17:06, 25 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 17:59, 27 April 2018 (UTC)
  12. DePlusJean (talk) 19:30, 25 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 19:19, 27 April 2018 (UTC)
  13. Jnanaranjan Sahu ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 19:19, 27 April 2018 (UTC)
  14. --Jonatan Svensson Glad (talk) 20:05, 27 April 2018 (UTC) ✓ - emailed via Special:EmailUser Keegan (WMF) (talk) 15:08, 30 April 2018 (UTC)
  15. ...
  16. ...

Discussion about the process[edit]

Questions, comments about how captions can be tested? Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

  • I imagine many already know this, but German, Japanese, and Arabic are language where we have reasonably large pools of people, and which collectively exercise most features of localization. Also, ideally, some language where we support two different writing systems for what is otherwise the same language (any particular suggestion of which? I don't know offhand what we support that way). - Jmabel ! talk 22:38, 23 April 2018 (UTC)
    • For a language with different supported writing systems, you could use Serbian maybe. Wikibelgiaan (talk) 12:17, 25 April 2018 (UTC)
  • Is uploading at federated-commons.wmflabs.org possible via classical upload form Commons:Upload? I trust Web forms more than modern solutions. Incnis Mrsi (talk) 21:14, 24 April 2018 (UTC)
    • Yes, they have there the clasical old upload form.--Juandev (talk) 22:24, 24 April 2018 (UTC)

@Juandev, Raymond, Yann: thank you for your feedback. I'll work on getting responses to you where they are required. Are you all done with your accounts? Let me know when you are so I can send the accounts to others. Keegan (WMF) (talk) 21:51, 25 April 2018 (UTC)

✓  Done for me. Yann (talk) 05:10, 26 April 2018 (UTC)
✓  Done for me. Raymond 07:12, 26 April 2018 (UTC)
✓  Done for me.--Juandev (talk) 19:19, 26 April 2018 (UTC)

@Steinsplitter, Dyolf77, Sannita, Syced, Jhalmuri: How is testing the tool going for feedback? Do you have any questions? Keegan (WMF) (talk) 18:00, 1 May 2018 (UTC)

@Keegan (WMF): Sorry, was a bit busy :). Added feedback. --Steinsplitter (talk) 18:04, 1 May 2018 (UTC)
@Keegan (WMF): Sorry, still busy, will do tomorrow, I swear! --Sannita - not just another it.wiki sysop 08:14, 3 May 2018 (UTC)
@Keegan (WMF): ✓  Done for me. --Sannita - not just another it.wiki sysop 12:55, 4 May 2018 (UTC)

Feedback on the prototype[edit]

This is where thoughts about the prototype will go. Keegan (WMF) (talk) 17:56, 23 April 2018 (UTC)

  • At first view,
I wrote the caption, but now I don't know what to put in the section "Description Describe what is notable about the file." what are the kind of infos to put there? Describe what is notable about the file is not really clear to me.
When you click on " Add a caption in another language", the new section created is also in english, and you have to re-click to make appear the language choices. This is boring, make the language choice appears as soon as you click please.
for the location heading, you must chose an angle (e.g. 45°), why not also the possibility N, NNW, SE, ect....
I see that the caption is in fact the label of the item of the file, but that don't appears in the file page
Christian Ferrer (talk) 18:15, 24 April 2018 (UTC)
If I understood well the section description is the same thing as the current field description, and the caption is a quick summary of what we see on the image? Christian Ferrer (talk) 18:19, 24 April 2018 (UTC)
I suggest to change "Description : Describe what is notable about the file." by something more precise in the kind "Description : detail the description (or "write a detailled description") with what is notable about the file (subject, place, context, ect...). Christian Ferrer (talk) 18:33, 24 April 2018 (UTC)
detailled => detailed, ect => etc.
I think "notable" is too much of an insider's word. I don't have an exact wording, but something along the lines of "a caption that is likely to be useful with this file wherever it is reused."
I agree about "subject, place, context". - Jmabel ! talk 07:56, 25 April 2018 (UTC)
Of course, I'm saying this without seeing the UI myself. - Jmabel ! talk 07:58, 25 April 2018 (UTC)
  • Works well so far, but
Shouldn't the description be added in the description of the MediaInfo? Or that will be done at a later date? Regards, Yann (talk) 18:44, 24 April 2018 (UTC)
  • Technically it works... I understand the concept of the "caption" as label for the M-entry ... but... as photographer I really have no idea how to fill the caption field with meaningful text that is different from the filename. Furthermore it is a question of work to create for every file a meaningful caption. Raymond 20:08, 24 April 2018 (UTC)
  • First of all I would like to appologise of my poor English, but hope you will understand.
  • So thank you very much to allow me to test this! There were certain things, I had to learn. There are certain things, I dont like on the sollution, but lets say this may change in the future so lets focus on the caption problem. My thoughts are as follows:
    • Wikidata interface will be confusing for WMC users and some may tend not to use it, because the wont learn it (maybe some simple videotutorial would help)
    • Filling MediaInfo should be kind of automatic, this way it looks, you fill the description of the file and then you have to fill media info also, which takes more of your time. Some of us thought that Wikidata/Wikibase integration to WMC may do it other way, i.e. you add less information and software will fill more lines. So why not to create file description structured or semi structured and get some data to template:description from wikidata?
    • I came into the conclusion that caption/label/Item name is not so important for Wikimedia commons. The problem is, that on WMC file naming is not standardized. So than in every language it may not be just translation, but completely different form, which does not help to neither party. I think there are two ways how to solve the problem. It probably depends, what use we expect from the whole integration and what can be done.
      1. Retrive automaticaly file name (including filetype e.g. jpeg) and use just caption description, translating it to different languages. Because, here it would be more usefull to add some statements, which describes the file (if possible) and than search images using these properties.
      2. Name the basic image depiction (red sofas, red sofa). Then we could have more same captions, which would differ by its description. But I am not in favour much of this sollution as I think the first is better. Because, what we expect on structured information on the files is, to describe them and than be able to filter that information. So it is not so much important the name of file (like it is important on Wikipedia or Wiktionary), but metadata and categories/description. Filename on Commons is just the technical thing, which comes from the fact, you cannot use more filenames for more images (like you can do on other media databases), so kind of system of filenaming was developed, but its not a broad standard for all subcomunities of WMC (here I refer to the different naming traditions for polish and czech WMC communities).
  • So would it be possible to create or use some wikidata properties for file description?
  • Finally, If you have about one hour, I have created 3 screencasting videos in English on YouTube, which shows my tests and my thoughts on the feature:
  • So I slept and refresh my mind:
    • still not sure, what the captions are (label? label description? both?)
    • Label for WMC files, does not make much sense. On wp pagename is very important, on wikt page name is even more important, on the other side on commons, its less important. On commons, filename makes pagename, but its not standardized and its creation is subjective. So I would propose not to use label at all, but due to the file nameing tradition, I would definitely propose to use filename. And again it is not so much important to have label in more languages. What is important to have a clear label description, which might be a shorten clear version of file description from a template and than (statements, which will provide structured information about the file). These could be:
      • file type (jpeg, pdf, etc.)
      • creation date
      • author
      • source
      • uploader
      • license
      • camera used, color or bw?, other technical metadata like those which are edit by templates or special categories
      • several statements describing the content (image take in=Hotel Thermal, on the image=read chairs ~ type - 1970 chairs, color - red, whatever....]--Juandev (talk) 06:42, 25 April 2018 (UTC)

At the Wikimedia Conference, some of us discussed the possibility that for people with a lot of wiki experience, it might be good if an alternate UI was also available that was simply a block of text with mark-up (a la wikitext), and where a back end would deserialize that and send the appropriate pieces to Wikidata. - Jmabel ! talk 08:02, 25 April 2018 (UTC)

Language selection bug

Hi all! Here is my feedback:

  • Bug when selecting the language, see the video. It only happens when I don't release the mouse button. I mean: Put mouse over "English", press mouse, move mouse to desired language, release mouse.
  • The descriptions are not visible in the MediaInfo page. See for instance https://federated-commons.wmflabs.org/wiki/MediaInfo:M295 , I entered descriptions in both English and French but they are not visible. I can reproduce the problem.
  • Let's say you enter descriptions in many languages, and you mistakenly let the left button to "English" for one of them, so there are two descriptions for "English". In this situation, you only get a difficult-to-understand error message at the very bottom of the page. To avoid this, how about changing the language selection button to only propose languages that are not being used yet? For instance if you already have an "English" description, do not propose "English" again. By the way I managed to upload an image with two English descriptions, here is the result.
  • With Caption, Description, and automatically-generated MID now available, the "Title" field should go away, but I guess that will be a subsequent step.
  • The whole thing is navigable with the keyboard, that's great!

Keep up the great work! :-) Syced (talk) 03:09, 27 April 2018 (UTC)

BTW, there is a New Upload Wizard on the test site? I havent seen it.--Juandev (talk) 05:05, 27 April 2018 (UTC)
To my knowledge there is not a new UploadWizard. Parts of it may look different as it is a testing ground, but it is the same software. Keegan (WMF) (talk) 18:49, 27 April 2018 (UTC)
  • Thanks. I uploaded two photographs. When I uploaded the first photograph, I only filled the captions in English and French. Once, the photo was uploaded, I saw the description field in French. The position of mediainfo field is lower than that I had accepted. On clicking the field, I was directed to Wikidata-like site. The French label was missing. So I manually entered this on the Wikidata-like site. Today I once again tried with another photograph. This time, I filled both the captions and descriptions in three languages: English, French and German. Once the upload was complete, I could see the description in all the three languages. However, on clicking the mediainfo link, I see only the labels on the Wikidata-like site, but not my descriptions. Overall, my experience was exactly the same when I upload photographs on Wikimedia commons. This is really great. However, I do not understand why only labels were filled on Wikidata-like site and not descriptions. Thanks and keep up the good work. John Samuel (talk) 17:36, 28 April 2018 (UTC)
  • I did a test upload (including changes) a few days ago. Looks good for me. Had no problems so far and the functionality is reasonable. Maybe some kind of auto detection or automatically import of license from the filedescrption etc. of existing files would be useful. --Steinsplitter (talk) 18:04, 1 May 2018 (UTC)
  • I did a test uploading an old screenshot I did for a paper about Wikidata. My feedback so far:
    • "Caption" and "description" are somewhat misleading names: I thought "caption" would have been the label, and description, well, the description of the mediainfo page. Turned out I was wrong.
    • Also, I'm used to put a full stop (".") at the end of a description, it'd nice to remember users not to do it again, or it will be imported in the labels/descriptions.
    • Is the Commonsrepo going to directly take items from Wikidata? I'm trying to put a "depicts: Wikidata property" triple, but it's not working. (Not a real problem, but still...)
    • Is the property suggester going to be available on Commonsrepo (I think it will, just asking)
Keep up the good work! :) --Sannita - not just another it.wiki sysop 12:54, 4 May 2018 (UTC)

Self parent cats due to wikidata[edit]

The Birth of Venus (Botticelli) and all files within Details of The Birth of Venus (Botticelli) stating Wikidata qnumber automatically get part of The Birth of Venus (Botticelli), same with Christ on the Cross adored by two Donor (El Greco), Adorazione dei Magi by Gentile da Fabriano. Please stop it and repair.--Oursana (talk) 22:49, 1 May 2018 (UTC)

Pinging @Jarekt: looks like a problem with {{Artwork}}? --El Grafo (talk) 09:24, 2 May 2018 (UTC)
El Grafo and Oursana that issue should be fixed now. --Jarekt (talk) 17:43, 2 May 2018 (UTC)
@Jarekt:, thank you, almost fixed. Please look into cat The Birth of Venus (Botticelli). All detail-files (which have all wd) have still automatically/magic the super cat, which they should not. --Oursana (talk) 21:40, 2 May 2018 (UTC)
I was trying to fix an issue where I occasionally see that when someone made a category for an individual artwork they did not moved all the images of that artwork into the category. I was imagined that it would be very rare to have sub-categories of individual artwork categories, so it would be safe to auto-categorize. But maybe it was a bad idea. --Jarekt (talk) 00:41, 3 May 2018 (UTC)
Same problem with Mérode Altarpiece, new cat:Mérode Altarpiece by the workshop of Robert Campin, getting a mess--Oursana (talk) 06:00, 8 May 2018 (UTC)
I do not see reason to auto categorize--Oursana (talk) 06:03, 8 May 2018 (UTC)
Oursana, I just turned it off. I was trying to remove it at the same time I roll out other changes but kept forgetting. --Jarekt (talk) 01:38, 9 May 2018 (UTC)
Jarekt Sorry still a mass, files of subcat are also in supercat and do not even show. see all files within Details of The Birth of Venus (Botticelli) stating Wikidata qnumber automatically get part of The Birth of Venus (Botticelli) and others--Oursana (talk) 01:57, 9 May 2018 (UTC)

Another example why structured data are needed for every-day commons user[edit]

Hi. Since I am here "frustrated" working with another commons user at an edit-a-thon, User:LigaDue, I'd like to share with you another example of why we really need structured metadata on commons soon or later. See for example here. We have no idea of the right standard to use for the title of a new category. The same happens for generic building exteriors. Is the right title "Exterior of...", "- outside" or "-exterior"? Of course, we can keep looking, compare different countries but it's such a wasteful process. We spend too much time trying to answer these questions when we clean the files. It would be nice if we could link to the wikidata item (or something similar) of these concepts, in these cases it would be so much faster and less ambiguous than making a statistics of usual strings in titles. Maybe some people know these things but if you need some real life examples to show to those who might think the actual structure of commons is fair to manage, well, that is one. Bye--Alexmar983 (talk) 09:50, 2 June 2018 (UTC)

  • No, that’s an example on how terminological standartization is important for a smooth and transparent workflow. You give no explanation on how magicly structured data will not suffer from the same growing pains as category names did/do. -- Tuválkin 16:21, 2 June 2018 (UTC)
  • +1 to Tuvalkin here. I'm all for structured data, but this is not a problem it would help solve. - Jmabel ! talk 17:14, 2 June 2018 (UTC)
    • Humm.. Not sure I've understood how structured data would work on Commons. But if it's a system where you say "exterior" (AKA "outside" AKA whatever, all defined in the same term) + "church" + "Italy", I believe it can indeed facilitate the process.-- Darwin Ahoy! 17:19, 2 June 2018 (UTC)
You describe with a clear unambiguous information that this image is 1) a building/church/palace 2) an exterior/an interior. At this point you know this is an intersection of the concepts "exterior" and "building". You can make a query with this information, of course and this basically eliminates the need of a categorization system to look for files but if you still need new categories for other reason (which is true), you can make them quickly and change the way the string of the title is called with a bot every time you want. Every time there are enough images with these intersections, the bot can create the category and of course it does with the standardized title we agree, even with basic standardized descriptions and standardize navboxes and so on. Everytime a certain combination becomes common, you can expand with a click the categorization tree. If I have to spend my afternoon getting frustrated creating manual categories that I have no idea I am doing right, and someone else have to overwrite probably what I am doing again and again, than I prefer to spend it converting previous categories and description information in metadata or revising similar information suggested by bot. It's information management in any case but with metadata the value of my effort is much bigger, so is its flexibility, and I would like my effort to be more fruitful. The manual categorization can always increases in confusion, the metadata architecture basically increases in sophistication. For many of us who work with both commons and wikidata, the need to handle commons files the way we handle wikidata item is something we start to feel. It took us two or three years to get a solid and robust metadata architecture on wikidata but at least when we do a query to search for something, it kinda works quite well, and it's improving. Our manual categorization system here is not so efficient, and we feel the frustration that we are not progressing to something that works better but simply adding partial and not always coherent patch here and there. ---Alexmar983 (talk) 18:05, 2 June 2018 (UTC)
  • And then imagine trying to do this kind of upload, when your first language is something other than English. Even if we standardize the vocabulary per @Tuvalkin:, the grammatical intersection of multiple concepts could be overwhelming. Sadads (talk) 12:21, 5 June 2018 (UTC)
  • No need to imagine: Enlish is not my native language and I do a lot of categorization in Commons. It works. Your vapourware does not. So, go ahead and keep wasting your time and talent and WMF donations money with it, but keep your hands off Commons categories. (By the way: Category:Pedro Mexia was just created; everything works, except creating links to pt:Pedro Mexia via Wikidata because reasons.) -- Tuválkin 20:04, 5 June 2018 (UTC)
Just had a look at Category:Pedro Mexia, and the link to pt:Pedro Mexia is there just fine. Jean-Fred (talk) 06:37, 6 June 2018 (UTC) P
  • It takes time to transclude. Yet when I tried to manually creating the recyprocal link, I was faced with a gobbledegook error message, which is not what one should get from a UI when trying to do something thats already done. -- Tuválkin 11:32, 6 June 2018 (UTC)

┌─────────────────────────────────┘

  • Sadads, Sadads, Tuválkin categories are... in English. How is it possible that people are worried about metadata in English and not categories? it's more complicated to find the names of categories in English (or miscategorized files with description in other languages) than an integrated systems with wikidata that can provide labels in different languages automatically, which is what wikidata already does. It's not even a problem, actually metadata increases multilingual flexibility because they standardize the handling of key concepts. It will remains a vaporware not because a lack of tools to do it but resistance. Like keep your hands off Commons categories... it's weird because even if you keep manual categories as much as it pleases you, metadata can be used in parallel and basically they can be used to make categories in a much more efficient way. It's written above in my example, it's not a complicated automation, you only need to invest in metadata for files, which people like me are welcome to do instead of battling with these strings and category trees. Grammatical intersection of multiple concepts could be overwhelming? yes, like with categories. You don't see it a lot in their cases because they are done manually. But that's not a balanced solution! Metadata automation does not automatically increases details, it simply forces you to clearly define the detail levels which are ok. You can have them on demand in a personal query (which is good) or decide which level of categorization to provide. And that's a good thing, a responsible thing. Currently, you simply have excessive details here and there in any case, mostly hindered in a bunch of categroization holes. This is a poor scenario, because it literally means you have no idea what to expect from the category tree.
Of course, if we started years ago it would have taken a lower effort to adapt metadata, but it's never to late to see things in a functional perspective instead of projecting fears. I mean, I care about money too, precisely I care about this huge amount of time wasted that it is also indirectly money. A lot of money. The metadata investment is already years late and we are far beyond the key steps of literacy, as far as I can see. Not literacy of newbies, in my experience. New users see the metadata quickly, they learn wikidata (and metadata) quite fast, when they arrive to the mediocre, not flexible for multilingualism, not complete, uneven and time-consuming architecture of commons categories I can simply link to them these discussions and explain how commons is currently "protected" by this scenario. So, yes, I can show them how to make a query to list with pinpointed precision what they need amongst millions of wikidata items but not millions of files. In the end, more years of metadata illiteracy will simply leave to new generation of users a much expensive bill. Well, not my fault.--Alexmar983 (talk) 03:22, 12 June 2018 (UTC)
  • This is what theory looks like and what I expected from Wikidata when it appeared a few years back. I immediatly thought — yay, we can have language-independent categorization! (Yes, because unlike what you slyly imply, I’m very much not an Anglocentric monolingual, as my user page hopefully suggests.) But no. This is what Wikidata has been so far: Underwhelming in meeting its originally percieved goals and at the same time threatning to take over systematized data from other projects (like geolocation from Commons, infobox data from Wikipedia(s), the chilling annoucement of lexical data to endanger Wikitionnary etc.), locking it in a dumbed-down, gamified UI that cannot sustain the kind of workflow “power users” are accustumed to, effectivly shutting down the mechanism that allowed Commons (and the other projects) the very build-up of entered data.
But go ahead, maybe it will become a beautiful thing. Just don’t destroy others’ way of contributing, okay? Feel free to diss categories, it’s amusing when you do it, but refrain from pushing to its removal from Commons.
-- Tuválkin 11:11, 12 June 2018 (UTC)
Tuválkin refrain from pushing to its removal from Common Just don’t destroy others’ way of contributing Who said that? I did not. Are you talking to me or not? You can go on creating manual category as much as it pleases you, as far I care, it's your time... but there are many users that don't get why they still MUST do it this way. And trust me when you enter not simply a remote hamlet in a European country but entire areas of the rest of the world, just to stick to geography, you feel it so strong that the ecosystem is not sustainable. Certainly not if you keep pushing millions of files from other platforms by bot. We have bots for that, but not for system based on a semantic architecture because we keep delaying it in every way like this and that's apparently good. Can I add "interior" "church" "name of place" by a multlingual menu and let a bot make a category while you handle your manual category please? Read what I have written, don't reply to what you think it's written. It's not constructive.
In my experience, I expected from Wikidata something, it was not there yet, now it is there or it starts to be there. I am talking based on years of interaction, as we are asking with other users. I could write a lot about experience, but I just repeat: I'd like to spend my afternoon investing in a metadata architecture for the future than this category tree. Not just me. And I would like when I share this experience not see to see a bunch of users act this way. Not just for me but for the people I show these pages later. I don't know if you have noticed, but it seems as if I did a reply to the presented concerns showing they are also the same defects of the current system you consider the best option or alternative (but again, you can keep that manual system, just don't force us all to use only it forever). People I show this page might notice that, including the fact that so far the reply were also a little but vaporware themselves.
A vaporware of fears, but that's what it is. Are you really worried that the manual categories disappear? because I am not, I want to make manual categories about very sophisticated topic myself but not about "war memorials in the province of X". It's 2018, please... there a much more delicate topic I should invest my time.--Alexmar983 (talk) 13:40, 19 June 2018 (UTC)