Commons talk:Structured data/Modeling

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

These earlier notes and resources may be inspiring:

SandraF (WMF) (talk) 10:21, 4 September 2019 (UTC)

Wikimedia Commons only properties[edit]

Does the Wikibase allow to have some extra properties, or items which are not in Wikidata? Juandev (talk) 15:15, 13 September 2019 (UTC)

No, all properties and items are created on Wikidata. Properties can be just Wikidata property to describe media items (Q28464773) to indicate that the properties should only be used here and not on Wikidata. See for example digital representation of (P6243) Multichill (talk) 08:46, 14 September 2019 (UTC)
OK, clear. And is there a list on Wikidata of such properties? Juandev (talk) 08:52, 17 September 2019 (UTC)
Would Commons:Structured data/Properties table (mentioned above) be what you are looking for? On the Wikidata side, you can also query for all Wikidata property to describe media items (Q28464773). Jean-Fred (talk) 10:38, 17 September 2019 (UTC)
That looks like somone personal resarch, rather than serious list. Juandev (talk) 13:47, 17 September 2019 (UTC)

Description[edit]

How we could move description to structured form? I thought we are not going to do, as we have structured caption.

But for example for some fusional languages even shorten structured description like caption does not have a sense, because if you are looking for a string you never now in which grammatical case will be the destination string. In practice searching this way you will find something, but at the same time, you'll be missing a lot of results. Juandev (talk) 08:51, 17 September 2019 (UTC)

Location[edit]

I don't know how to create a subpage so I place my thoughts here. By the location you mean, where the file was created? For this, I have started to use located in the administrative territorial entity (P131), but it does not cover all types of locations, or better to say, if you want to specify it further, like an exact street. I wonder if we may use located in the administrative territorial entity (P131) and provide further specifications by qualifiers.

There is also P706, but I dont see a reason why to replace P131, because every every peice of land on Earth should have some P131. The only case which doesnt fit might be photographs of sky or objects on e.g. Mars. In such cases P706 might be used. The question here is, weather we have property to indicate from where it was taken, e.g. where the telescope was located.

This is related to the need of photographer or camera (drons, satelites) position property to be created I guess. Juandev (talk) 09:06, 17 September 2019 (UTC)

I have started using country (P17) and location (P276) to describe context that is given in certain kinds of categories, but the location itself is not the "subject" of the photo, per the application of depicts (P180)
I know I have been seeing a lot of "depicts" statements that don't really "depict" a location, but its an arbitrary contextual place. I.e. @Missvain: used P180 on File:Fish_&_Chips_(16963939795).jpg, which to me feels more like a location (P276) -- i.e. I have been applying that to contexts like File:Cygnus_buccinator_-Riverlands_Migratory_Bird_Sanctuary,_Missouri,_USA_-flying-8.jpg where its clear that the location is not being depicted, but rather its a setting for the animals. Sadads (talk) 16:57, 21 September 2019 (UTC)
Yes, that's how I was under impression it should be used. Just like on Wikidata. I'll describe an artwork's location as being in a specific gallery (if there is a Wikidata entry for it) or if it's a restaurant on Wikidata, I'll use location to describe the neighborhood it is in in a restaurant (i.e. Applebee's, Time Square, New York). Missvain (talk) 17:39, 21 September 2019 (UTC)
I disagree with your interpretation, but still I think you did the correct thing. I would say, that you indicate here affiliation, because Red Robin (Q7304886) does not characterize the physical place, but the virtual entity - restaurant chain.Juandev (talk) 20:13, 21 September 2019 (UTC)
Agreed. location (P276) sound very good to indicate the place where the photo is taken (and not what the images depicts), furthermore I think that we can also use location of discovery (P189) to indicate the place where some depicted objects/specimens have been discovered, e.g. File:Heliconius numata numata MHNT.JPG. Christian Ferrer (talk) 18:26, 21 September 2019 (UTC)
So what about located in the administrative territorial entity (P131), should I use location (P276) instead? I think it's important to know, how or if we can mine certain data then it would be easier to decide how to flag certain information. Juandev (talk) 20:13, 21 September 2019 (UTC)
If our intention is that people who are completely unfamiliar with Wikidata will be entering structured data statements on Commons, then I think it's inevitable that "location" will be used for all sorts of place-related concepts in ways that are different from the fine-grained distinctions we make in Wikidata. The only way I can come up with to ameliorate this is to have fields in the interface that are hard-coded to specific WD properties, the way "depicts" is, and provide prompts to the user that are specific to each such property. - PKM (talk) 23:44, 21 September 2019 (UTC)
The question could be solved, for that specific topic and many others too, if the results of a search for a statement value could includes all the results of searchs for the same value but stored within the relative subproperties. I asked, and gave an example at mw:Help talk:Extension:WikibaseCirrusSearch. Christian Ferrer (talk) 04:35, 22 September 2019 (UTC)
I agree with PKM, I feel like the broad application of different kinds of locations in which photography has taken, that we need, for inexperienced users and the desire of creating better search, to rely on a generalized sense of location, Sadads (talk) 16:05, 13 November 2019 (UTC)

Building model how we set that a pictures shows front/back etc....[edit]

Using Magnus Manskes new tool to add #SDC to pictures of building I realized it would be good to add if the picture is showing the "front" of a building or the "back". Any suggestions how this is done? - Salgo60 (talk) 11:40, 24 September 2019 (UTC)

There are probably two approaches. One is to create properties for front and back (e.g. clothes have also two sides) or use some kind of "clock-like" orientation. E.g. Wind mill facing to two o clock. Juandev (talk) 06:48, 27 September 2019 (UTC)

Original work and digital representation[edit]

Easy example, one work
Scan example, multiple works

Hi everyone, I see a recurring discussion for which we need a generic solution: How to handle the distinction between the original work and the digital representation of this work. First take the top example. That's just a photo taken by one of our users. No need to make any distinction between works. The second example uploaded by Dominic we do have multiple works: The original photo and the scan. Preferably we can use exactly the same statements for the original work and digital representation. To make the distinction about what statement applies to what work, we need to agree on what qualifier(s) to use. Maybe applies to part (P518) or subject has role (P2868)? And maybe Q id's for the original work and for the digital file? Could maybe work together with digital representation of (P6243) in cases where the original work has an item on Wikidata, but let's assume for this conversation that it's not the case. @Jarekt, Christian Ferrer, Jheald: Multichill (talk) 10:01, 5 October 2019 (UTC)

File:DD 364 USS Mahan (Bow Head On) - (detail) - NARA - 19-N-67750.tif is a digital representation of (P6243) {{QXXXX}} that is the item corresponding to the original photography that depicts itself some specific things, among others, but not only, USS Mahan (Q1136398), {{QXXXX}} has also a copyright status, and the file here inherits that copyright status as it is a faithful reproduction of the former photography.
Otherwise without an existing {{QXXXX}}, File:DD 364 USS Mahan (Bow Head On) - (detail) - NARA - 19-N-67750.tif depicts itself some specific things, among others, but not only, USS Mahan (Q1136398); and is a digital representation of (P6243) photograph (Q125191) whose copyright is one specific thing quoted with copyright status (P6216), used as qualifier of that last property. Christian Ferrer (talk) 11:53, 5 October 2019 (UTC)
@Christian Ferrer: I think you're missing the point here, we're not going to make a Wikidata item for every original photograph.
In this example we want to indicate that the fabrication method for the original photo is black and white photography (or something like that) and for the digital copy that the fabrication method is a digital scan. This topic is about how these two statements can in the same imageinfo and we can clearly distinguish between the two. Multichill (talk) 16:32, 5 October 2019 (UTC)
Yes indeed I missed the point, sorry.
fabrication method (P2079) scanning (Q59155052) of (P642) monochrome photography (Q3381576)
or
fabrication method (P2079) scanning (Q59155052) + fabrication method (P2079) monochrome photography (Q3381576)
Though the second way (fabrication method → monochrome photography) is likely more suitable for this kind of case. (Note that I already use fabrication method). Christian Ferrer (talk) 18:29, 5 October 2019 (UTC)
@Christian Ferrer: I think this is still not quite satisfactory, because you're trying to shoehorn these two pieces of data (original work's "fabrication method" and digital representation's "fabrication method") into a single statement. This would have the practical effect of requiring an editor to come up with a "fabrication method" about the media file in order to be able to describe the original work's "fabrication method" in a qualifier. There are other situations where this approach wouldn't really be practicable at all, such as if an original work can be described with a property that the digital representation can't have at all—think about properties like the external identifier, collection, and so on. Dominic (talk) 17:17, 8 October 2019 (UTC)
@Multichill: Thanks, this is definitely an important discussion. I have more questions than answers. I think the for objects like this which are coming from a cultural repository and have authoritative metadata, probably the goal should be to have a Wikidata item for each. I understand that is not going to be the case for every upload (at least we don't want to make it a required step before uploading), but should we state that is the preference? (This is not answering your direct question, but seems pertinent.)
I want to also point out there there may be an additional layer to consider here. Not all digital representations have a one-to-one relationship to their original work. For the battleship image above, for example, the original document (as described by the source institution, at least), is actually two images with the same identifier (because the one is a detail made from the same original photo). So, in cases like this where the individual Commons media file is a digital representation of only a portion of a work/object, we have (1) the metadata of the media file itself, (2) the metadata specific to the portion of the original work depicted, and (3) the original work's metadata. For a more obvious example, consider a 5-page historical letter scanned as 5 separate JPG files. Some pages will have words or topics not included on others; some claims will be the same across all component images in such a work, but some would certainly not be (such as if we had claim for "page number"). Dominic (talk) 16:57, 8 October 2019 (UTC)
@Dominic: expanding the scope of a problem is not really a good problem solving strategy. No, we're not going to create a Wikidata item for every original photo out there just like we're not going to create a Wikidata item for every author who happens to have an image on Commons. So we do need a straightforward approach with qualifiers because we'll have a bunch of statements and some of them are for the original work and some of them are for the digital copy. Please focus on that problem. Once we think we have a good solution, we can expand the scope and solve more complicated cases. Coming to thing of it, subject has role (P2868) is probably the best option to use. As a target we need an item for the original work and the digital file. Do we already have suitable items for that? Multichill (talk) 21:20, 8 October 2019 (UTC)
@Multichill: I am not sure to understand how you want to use subject has role (P2868), can you explain a little bit please? Christian Ferrer (talk) 21:33, 8 October 2019 (UTC)
I think the suggestion is something like this:
This solves the problem more nicely than the other suggestions above, because it allows for any number of statements to refer to either the underlying work or the Commons file. We would just need to standardize the ways to refer to those different roles, as Multichill says. The only issue with this is that it seems like might need to be enforced on all statements to avoid ambiguity? Or we could say it is optional, and only required to indicate when underlying works are being described, but that risks more user input error. Dominic (talk) 20:01, 9 October 2019 (UTC)
Yes, exactly, not sure if I would use scanning (Q59155052), I would probably go for the more generic digital image (Q1250322). For a file if we start talking about multiple roles (by adding the qualifier), we should check all statements at that point. Feels a bit like converting {{Information}}/{{Artwork}} to {{Art photo}} for which I sometimes use {{Art Photo/subst}}. Here you see things like license, date and photographer. I guess GLAM donations will run into this more often than regular users. Multichill (talk) 15:59, 13 October 2019 (UTC)
@Multichill: My point wasn't to expand the scope of the problem, but to state the full problem as I see it. I believe there are three possible work types, not just the two ("original work and digital representation") you suggested, since the digital representation can be representation of either a part or a whole of the original work. As I said, a scan of page in a book is a digital representation of a part of that book. The original work could have number of pages (P1104), but page(s) (P304) would refer only to the portion of the original work depicted. Perhaps this just needs to be a different target item for P2868 qualifiers, but I believe we should keep this scenario in mind and have an approach that works for it as well. Dominic (talk) 20:14, 9 October 2019 (UTC)
@Dominic: Ok. Would the subject has role (P2868) approach work here? Multichill (talk) 15:59, 13 October 2019 (UTC)