Commons talk:Structured data/Modeling

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Earlier notes and resources
Talk pages of subpages

Wikimedia Commons only properties[edit]

Does the Wikibase allow to have some extra properties, or items which are not in Wikidata? Juandev (talk) 15:15, 13 September 2019 (UTC)Reply[reply]

No, all properties and items are created on Wikidata. Properties can be just Wikidata property to describe media items (Q28464773) to indicate that the properties should only be used here and not on Wikidata. See for example digital representation of (P6243) Multichill (talk) 08:46, 14 September 2019 (UTC)Reply[reply]
OK, clear. And is there a list on Wikidata of such properties? Juandev (talk) 08:52, 17 September 2019 (UTC)Reply[reply]
Would Commons:Structured data/Properties table (mentioned above) be what you are looking for? On the Wikidata side, you can also query for all Wikidata property to describe media items (Q28464773). Jean-Fred (talk) 10:38, 17 September 2019 (UTC)Reply[reply]
That looks like somone personal resarch, rather than serious list. Juandev (talk) 13:47, 17 September 2019 (UTC)Reply[reply]


How we could move description to structured form? I thought we are not going to do, as we have structured caption.

But for example for some fusional languages even shorten structured description like caption does not have a sense, because if you are looking for a string you never now in which grammatical case will be the destination string. In practice searching this way you will find something, but at the same time, you'll be missing a lot of results. Juandev (talk) 08:51, 17 September 2019 (UTC)Reply[reply]


I don't know how to create a subpage so I place my thoughts here. By the location you mean, where the file was created? For this, I have started to use located in the administrative territorial entity (P131), but it does not cover all types of locations, or better to say, if you want to specify it further, like an exact street. I wonder if we may use located in the administrative territorial entity (P131) and provide further specifications by qualifiers.

There is also P706, but I dont see a reason why to replace P131, because every every peice of land on Earth should have some P131. The only case which doesnt fit might be photographs of sky or objects on e.g. Mars. In such cases P706 might be used. The question here is, weather we have property to indicate from where it was taken, e.g. where the telescope was located.

This is related to the need of photographer or camera (drons, satelites) position property to be created I guess. Juandev (talk) 09:06, 17 September 2019 (UTC)Reply[reply]

I have started using country (P17) and location (P276) to describe context that is given in certain kinds of categories, but the location itself is not the "subject" of the photo, per the application of depicts (P180)
I know I have been seeing a lot of "depicts" statements that don't really "depict" a location, but its an arbitrary contextual place. I.e. @Missvain: used P180 on File:Fish_&_Chips_(16963939795).jpg, which to me feels more like a location (P276) -- i.e. I have been applying that to contexts like File:Cygnus_buccinator_-Riverlands_Migratory_Bird_Sanctuary,_Missouri,_USA_-flying-8.jpg where its clear that the location is not being depicted, but rather its a setting for the animals. Sadads (talk) 16:57, 21 September 2019 (UTC)Reply[reply]
Yes, that's how I was under impression it should be used. Just like on Wikidata. I'll describe an artwork's location as being in a specific gallery (if there is a Wikidata entry for it) or if it's a restaurant on Wikidata, I'll use location to describe the neighborhood it is in in a restaurant (i.e. Applebee's, Time Square, New York). Missvain (talk) 17:39, 21 September 2019 (UTC)Reply[reply]
I disagree with your interpretation, but still I think you did the correct thing. I would say, that you indicate here affiliation, because Red Robin (Q7304886) does not characterize the physical place, but the virtual entity - restaurant chain.Juandev (talk) 20:13, 21 September 2019 (UTC)Reply[reply]
Agreed. location (P276) sound very good to indicate the place where the photo is taken (and not what the images depicts), furthermore I think that we can also use location of discovery (P189) to indicate the place where some depicted objects/specimens have been discovered, e.g. File:Heliconius numata numata MHNT.JPG. Christian Ferrer (talk) 18:26, 21 September 2019 (UTC)Reply[reply]
So what about located in the administrative territorial entity (P131), should I use location (P276) instead? I think it's important to know, how or if we can mine certain data then it would be easier to decide how to flag certain information. Juandev (talk) 20:13, 21 September 2019 (UTC)Reply[reply]
If our intention is that people who are completely unfamiliar with Wikidata will be entering structured data statements on Commons, then I think it's inevitable that "location" will be used for all sorts of place-related concepts in ways that are different from the fine-grained distinctions we make in Wikidata. The only way I can come up with to ameliorate this is to have fields in the interface that are hard-coded to specific WD properties, the way "depicts" is, and provide prompts to the user that are specific to each such property. - PKM (talk) 23:44, 21 September 2019 (UTC)Reply[reply]
The question could be solved, for that specific topic and many others too, if the results of a search for a statement value could includes all the results of searchs for the same value but stored within the relative subproperties. I asked, and gave an example at mw:Help talk:Extension:WikibaseCirrusSearch. Christian Ferrer (talk) 04:35, 22 September 2019 (UTC)Reply[reply]
I agree with PKM, I feel like the broad application of different kinds of locations in which photography has taken, that we need, for inexperienced users and the desire of creating better search, to rely on a generalized sense of location, Sadads (talk) 16:05, 13 November 2019 (UTC)Reply[reply]

Building model how we set that a pictures shows front/back etc....[edit]

Using Magnus Manskes new tool to add #SDC to pictures of building I realized it would be good to add if the picture is showing the "front" of a building or the "back". Any suggestions how this is done? - Salgo60 (talk) 11:40, 24 September 2019 (UTC)Reply[reply]

There are probably two approaches. One is to create properties for front and back (e.g. clothes have also two sides) or use some kind of "clock-like" orientation. E.g. Wind mill facing to two o clock. Juandev (talk) 06:48, 27 September 2019 (UTC)Reply[reply]

Original work and digital representation[edit]

Easy example, one work
Scan example, multiple works

Hi everyone, I see a recurring discussion for which we need a generic solution: How to handle the distinction between the original work and the digital representation of this work. First take the top example. That's just a photo taken by one of our users. No need to make any distinction between works. The second example uploaded by Dominic we do have multiple works: The original photo and the scan. Preferably we can use exactly the same statements for the original work and digital representation. To make the distinction about what statement applies to what work, we need to agree on what qualifier(s) to use. Maybe applies to part (P518) or subject has role (P2868)? And maybe Q id's for the original work and for the digital file? Could maybe work together with digital representation of (P6243) in cases where the original work has an item on Wikidata, but let's assume for this conversation that it's not the case. @Jarekt, Christian Ferrer, and Jheald: Multichill (talk) 10:01, 5 October 2019 (UTC)Reply[reply]

File:DD 364 USS Mahan (Bow Head On) - (detail) - NARA - 19-N-67750.tif is a digital representation of (P6243) {{QXXXX}} that is the item corresponding to the original photography that depicts itself some specific things, among others, but not only, USS Mahan (Q1136398), {{QXXXX}} has also a copyright status, and the file here inherits that copyright status as it is a faithful reproduction of the former photography.
Otherwise without an existing {{QXXXX}}, File:DD 364 USS Mahan (Bow Head On) - (detail) - NARA - 19-N-67750.tif depicts itself some specific things, among others, but not only, USS Mahan (Q1136398); and is a digital representation of (P6243) photograph (Q125191) whose copyright is one specific thing quoted with copyright status (P6216), used as qualifier of that last property. Christian Ferrer (talk) 11:53, 5 October 2019 (UTC)Reply[reply]
@Christian Ferrer: I think you're missing the point here, we're not going to make a Wikidata item for every original photograph.
In this example we want to indicate that the fabrication method for the original photo is black and white photography (or something like that) and for the digital copy that the fabrication method is a digital scan. This topic is about how these two statements can in the same imageinfo and we can clearly distinguish between the two. Multichill (talk) 16:32, 5 October 2019 (UTC)Reply[reply]
Yes indeed I missed the point, sorry.
fabrication method (P2079) scanning (Q59155052) of (P642) black-and-white photography (Q3381576)
fabrication method (P2079) scanning (Q59155052) + fabrication method (P2079) black-and-white photography (Q3381576)
Though the second way (fabrication method → monochrome photography) is likely more suitable for this kind of case. (Note that I already use fabrication method). Christian Ferrer (talk) 18:29, 5 October 2019 (UTC)Reply[reply]
@Christian Ferrer: I think this is still not quite satisfactory, because you're trying to shoehorn these two pieces of data (original work's "fabrication method" and digital representation's "fabrication method") into a single statement. This would have the practical effect of requiring an editor to come up with a "fabrication method" about the media file in order to be able to describe the original work's "fabrication method" in a qualifier. There are other situations where this approach wouldn't really be practicable at all, such as if an original work can be described with a property that the digital representation can't have at all—think about properties like the external identifier, collection, and so on. Dominic (talk) 17:17, 8 October 2019 (UTC)Reply[reply]
@Multichill: Thanks, this is definitely an important discussion. I have more questions than answers. I think the for objects like this which are coming from a cultural repository and have authoritative metadata, probably the goal should be to have a Wikidata item for each. I understand that is not going to be the case for every upload (at least we don't want to make it a required step before uploading), but should we state that is the preference? (This is not answering your direct question, but seems pertinent.)
I want to also point out there there may be an additional layer to consider here. Not all digital representations have a one-to-one relationship to their original work. For the battleship image above, for example, the original document (as described by the source institution, at least), is actually two images with the same identifier (because the one is a detail made from the same original photo). So, in cases like this where the individual Commons media file is a digital representation of only a portion of a work/object, we have (1) the metadata of the media file itself, (2) the metadata specific to the portion of the original work depicted, and (3) the original work's metadata. For a more obvious example, consider a 5-page historical letter scanned as 5 separate JPG files. Some pages will have words or topics not included on others; some claims will be the same across all component images in such a work, but some would certainly not be (such as if we had claim for "page number"). Dominic (talk) 16:57, 8 October 2019 (UTC)Reply[reply]
@Dominic: expanding the scope of a problem is not really a good problem solving strategy. No, we're not going to create a Wikidata item for every original photo out there just like we're not going to create a Wikidata item for every author who happens to have an image on Commons. So we do need a straightforward approach with qualifiers because we'll have a bunch of statements and some of them are for the original work and some of them are for the digital copy. Please focus on that problem. Once we think we have a good solution, we can expand the scope and solve more complicated cases. Coming to thing of it, subject has role (P2868) is probably the best option to use. As a target we need an item for the original work and the digital file. Do we already have suitable items for that? Multichill (talk) 21:20, 8 October 2019 (UTC)Reply[reply]
@Multichill: I am not sure to understand how you want to use subject has role (P2868), can you explain a little bit please? Christian Ferrer (talk) 21:33, 8 October 2019 (UTC)Reply[reply]
I think the suggestion is something like this:
This solves the problem more nicely than the other suggestions above, because it allows for any number of statements to refer to either the underlying work or the Commons file. We would just need to standardize the ways to refer to those different roles, as Multichill says. The only issue with this is that it seems like might need to be enforced on all statements to avoid ambiguity? Or we could say it is optional, and only required to indicate when underlying works are being described, but that risks more user input error. Dominic (talk) 20:01, 9 October 2019 (UTC)Reply[reply]
Yes, exactly, not sure if I would use scanning (Q59155052), I would probably go for the more generic digital image (Q1250322). For a file if we start talking about multiple roles (by adding the qualifier), we should check all statements at that point. Feels a bit like converting {{Information}}/{{Artwork}} to {{Art photo}} for which I sometimes use {{Art Photo/subst}}. Here you see things like license, date and photographer. I guess GLAM donations will run into this more often than regular users. Multichill (talk) 15:59, 13 October 2019 (UTC)Reply[reply]
@Multichill: My point wasn't to expand the scope of the problem, but to state the full problem as I see it. I believe there are three possible work types, not just the two ("original work and digital representation") you suggested, since the digital representation can be representation of either a part or a whole of the original work. As I said, a scan of page in a book is a digital representation of a part of that book. The original work could have number of pages (P1104), but page(s) (P304) would refer only to the portion of the original work depicted. Perhaps this just needs to be a different target item for P2868 qualifiers, but I believe we should keep this scenario in mind and have an approach that works for it as well. Dominic (talk) 20:14, 9 October 2019 (UTC)Reply[reply]
@Dominic: Ok. Would the subject has role (P2868) approach work here? Multichill (talk) 15:59, 13 October 2019 (UTC)Reply[reply]

Sandra we're talking about this and trying at File:0029 Frederick Coburn 1896.jpg. Taking the basic graph on the right, we have:

  • Subject: The original work and the digital representation of it. For example a painting and a digital photo
  • Predicate: The property we use. For example creator (P170)
  • Object: What we set the statement to. For example with creator (P170) this could be the original painter or the person who made the digital photo

So in the example we have two subjects (original work and digital representation) and to objects (original painter and the photographer). We can use subject has role (P2868) and object has role (P3831) to connect the two. If we make two new items for the concept of the "original work" and "digital representation" we can just use these to qualify statements with subject has role (P2868) in a consistent way. For the object has role (P3831) we can just use the photographer (Q33231). In case of a digitized photo we'll have two creator (P170) statements with object has role (P3831) -> photographer (Q33231), but one of the is qualified with subject has role (P2868) -> "original work" and the other with subject has role (P2868) -> "digital representation". This way we can just add all sorts of other statements like inception (P571), fabrication method (P2079) etc. and qualify them to make it very clear what we're referring to. @Christian Ferrer and Dominic: (and others): What do you think? Hardest part in this proposal is the two items we need. Multichill (talk) 22:16, 19 November 2019 (UTC)Reply[reply]

My opinion: digital representation of (P6243) should be used only for identified artworks with a corresponding item, and not to describe what we are seeing, for that use depicts (P180), here that is ok as this is Frederick Coburn (Q62113917). subject has role (P2868) can not be used in this way. Christian Ferrer (talk) 22:36, 19 November 2019 (UTC)Reply[reply]

I think Multichill's analysis is logically correct, but it's not practical to expect users to decorate every statement with "subject has role" qualifiers. Is there some way that the File: description page can have two explicit collections of claims: one whose subject is the digital file, and one (optional) whose subject is the thing that has been digitised? Pinging also Dominic, Christian Ferrer, Spinster, Jheald. — Pelagic (talk) 23:11, 31 January 2020 (UTC)Reply[reply]
I'm willing to do concessions in the data model, but I get the idea the main reason for simplification is that our current interface isn't ready for it yet.
Let's assume we can adjust the interface to make this easier. I bit like {{Art Photo}} where you have a section for the original work and for the photo. Would that work? Multichill (talk) 19:39, 3 February 2020 (UTC)Reply[reply]

Summary multiple works[edit]

If a file represents multiple works, for example a digital photo and a painting we use subject has role (P2868) to distinguish between the different types of works. subject has role (P2868) is used as a qualifier where we want to make the distinction. The target is what we would normally put in instance of (P31) so for example painting (Q3305213) and photograph (Q125191). Additionally for statements like creator (P170) we can object has role (P3831) to make the distinction. Target can be something like painter (Q1028181) and photographer (Q33231). We probably need to expand the items we have to be able to make a clear distinction in some cases. Say for example the case where someone digitizes a dia of a painting. Here we have three works:

This construct is only needed when we actually have multiple works. If we have a Wikidata item for the original work, we're not going to duplicate all the statements from Wikidata. Only statements that apply to the other work(s) are added and qualified with the relevant subject has role (P2868)/object has role (P3831). If this has to be done by hand, it's a lot of work. The interface should be improved and tools should be developed to make this easier. This summary only covers works where we have a relatively one on one match (painting -> dia -> digital photo) and not cases like a page from a book. A new topic was already opened for that. Let's see if anyone responds. I'm just going to move forward with this. It's a wiki, we can always change it again. Multichill (talk) 21:25, 17 February 2020 (UTC)Reply[reply]

Hi, I don't think we should use qualifiers to keep statements apart, but create separate items for the different objects we want to describe. If we don't force this distinction from the outset, it will be very hard to deal with statements that lack a qualifier. I don't think the qualifier approach scales.
Within Commons, the different objects we are describing should be represented in different tabs. By default, there is one tab with structured data. This is sufficient for digital born photographs uploaded by Commons users. As soon as we perceive the need to point to another object the image is a copy or faithful reproduction of, we could add a statement pointing to this new object (e.g. an analog photograph in a heritage collection) that would be described in another tab on Commons. In the Commons interface, statements about this other object would be added in the new tab. If the object has already been described elsewhere, the information could be transcluded after a "same as" statement has established identity between the two objects.
This approach could be nested, i.e. we may have three or more tabs with structured data for one Commons entry if there really is a need to make statements about further distinct items (e.g. digital image → analogue dia → painting).
When following this approach, we would need to define which statements are expected to automatically propagate from the "original" to the faithful reproductions and vice versa (e.g. "depicts" statements).
--Beat Estermann (talk) 09:51, 21 May 2020 (UTC)Reply[reply]
@Beat Estermann: where would you want to store the millions of separate items? On Wikidata? That has been discussed in length and I think any approach that forces us to create millions of new items on Wikidata just to be able to describe files here, is a wrong approach. Multichill (talk) 13:58, 21 May 2020 (UTC)Reply[reply]
@Multichill: Why not store them on Commons if there is no good reason to store them on Wikidata? - All the qualifier statements would be stored on Commons as well, wouldn't they? - In the example above, the item for the image would be stored on Commons, the one for the analogue dia as well, while the one for the painting could be stored on Wikidata if it is deemed to be within the scope of Wikidata. - The approach I'm suggesting does neither increase the number of objects described nor does it bloat the number of statements. The only thing it does is disentangling the objects described by creating separate items for them within the data model. Apart from being more explicit about what a statement applies to, the proposed solution would most likely also allow for easier querying. --Beat Estermann (talk) 16:13, 21 May 2020 (UTC)Reply[reply]
@Beat Estermann: Commons doesn't have the concept of an item. You can only add statements to files. Multichill (talk) 18:22, 21 May 2020 (UTC)Reply[reply]
@Multichill: Well, if we want to make use of federated instances of Wikibase to take some load off Wikidata, having items on the Wikibase instance of Wikimedia Commons would be the way forward. In any case, whether these items are on Wikidata or on Wikimedia Commons, the user interfaces to edit them should be closely integrated with Commons. --Beat Estermann (talk) 19:23, 21 May 2020 (UTC)Reply[reply]

How should this be modeled, notably "locutorGender" and "transcription" useful for querying.

I had proposed voice gender for the first. The "quote" property could work for the second.

@0x010C: Jura1 (talk) 15:36, 11 December 2019 (UTC)Reply[reply]

Maybe sex or gender (P21) would be a simple solution. Jura1 (talk) 17:04, 15 December 2019 (UTC)Reply[reply]

Digitised object is part of something else[edit]

Following on from Dominic's point above, let's say a library digitises a book, and each page is a separate digital file. (Maybe this isn't so common since Wikisource does some magic to display single pages from multi-image files. But there will be cases where users bring in selected pages from another repository. I've done it myself where I wanted a PD illustration for Wikipedia.)

Then we have "file –(digital representation of)→ page –(part of)→ book". Wikidata would (or should) have an Item for the book, but it's not desirable to have Q-Items for each and every single page. Qualifiers would be appropriate for this. Wikidata already uses qualifiers like start-date and end-date to delimit the scope of a statement. But we'd have to be careful what we infer from "file –(digital representation of)→ book [page 50]". The file only represents the part not the whole. (For simplicity I'm ignoring the FRBR requirement that book would have separate items for the Work and Realisation.). Perhaps a property with the semantics of "sourced from" would be more appropriate than "digital reprsentation of". We could say both "file sourced-from specific work [page=n]" and "file digital-representation-of original media type".

And what about longer chains? E.g. file cropped-from other file sourced-from website digitised-from page part-of scholarly article published-in volume of journal. Which of those entities gets a Q-Item, and can we expect Commons users to go to Wikidata and create them?

Pelagic (talk) 00:09, 1 February 2020 (UTC)Reply[reply]

Identifier for work[edit]

Page 32 of "The Howler" (1914), from Wake Forest University

Any opinions on how I might add the unique item identifier for a file like this one (at right)? Technically, the work is DPLA ID "c97e46e123596dc8b60a70e86a70131c". However, on Commons, we have (1) a media file that represents (2) only one page, of that work. There hasn't been much clarity from the discussions above about how to do this, but I would add these to over 100,000 DPLA files if I could. Dominic (talk) 15:20, 4 March 2020 (UTC)Reply[reply]

Suggested way of encoding Restriction tags[edit]

One of the GLAMs we have been working with asked us to take a look at how existing Restriction tags (insignia and trademark protection in particular) could be encoded as structured data. While I think use restriction status (P7261) is probably the right property to stick such statements under I'm struggling to model below that. what I came up with so far is something like use restriction status (P7261) partly restricted access (Q66739849) / laws applied (P3014) Act on Protection for Coats of Arms and Certain Other Official Signs (Q10553121) for something like {{Insignia-Sweden}}. /André Costa (WMSE) (talk) 11:52, 13 August 2020 (UTC)Reply[reply]

I've also seen suggestions of using e.g. use restriction status (P7261) partly restricted access (Q66739849) / has cause (P828) trademark (Q167270). What both of these approaches are missing is that they don't really capture what the restrictions is (which the templates today do). /André Costa (WMSE) (talk) 12:01, 13 August 2020 (UTC)Reply[reply]
I've gone ahead with use restriction status (P7261) partly restricted use (Q99867894) / has cause (P828) Act on Protection for Coats of Arms and Certain Other Official Signs (Q10553121) for now. That makes it clear that there are some use restrictions. Exactly what these are is left in the Template for now. /André Costa (WMSE) (talk) 07:44, 7 September 2022 (UTC)Reply[reply]

Pictures taken during events[edit]

Is there a preferred way to tag pictures with information about the event during which they were taken, but where the event itself is not depicted? For example, I'm tagging pictures that I've taken during the Gothenburg book fair (2016), of different prominent people attending, and I was thinking that it could be relevant to add information about them being taken during the fair, but I don't know how to do that correctly. /Hobbsansak (talk) 17:54, 20 August 2020 (UTC)Reply[reply]

@Hobbsansak: I had raised this at Commons_talk:Structured_data/Modeling/Location#Events? as well − looks like the current practice is to use significant event (P793) for that. Jean-Fred (talk) 09:35, 28 August 2020 (UTC)Reply[reply]
@Jean-Fred: Thanks for your reply. In that case I guess I'll follow current practise for now and just update if a new convention is decided upon later. Hobbsansak (talk) 11:25, 7 September 2020 (UTC)Reply[reply]

Taken with some camera[edit]

We have a lot of categories like Category:Taken with Canon EOS 500D to indicate that some type of camera (in this case Canon EOS 500D (Q64143)) was used to produce a photo. I was looking around the current properties for a suitable one. I found properties like made from material (P186), fabrication method (P2079) and item operated (P121), but not really what I was looking for. I'm looking for something generic like "equipment used" or "tool used". Any suggestions? Multichill (talk) 19:09, 28 August 2020 (UTC)Reply[reply]

Did some more digging around and found captured with (P4082) and it's already in use. Multichill (talk) 19:43, 28 August 2020 (UTC)Reply[reply]
Good finding − was also looking for it and could not find it :) Jean-Fred (talk) 20:04, 28 August 2020 (UTC)Reply[reply]
Interesting, although its for cameras only, not other equipment such as scanners, microscopes or lenses. --ghouston (talk) 02:01, 29 August 2020 (UTC)Reply[reply]
Good idea. Should be suitable for cameras and lenses. (And scanners.) IMO lens is mandantory. I can't take a photograph with a camera alone. And it should be comparable with cameras with a built-in lens. --XRay talk 06:26, 29 August 2020 (UTC)Reply[reply]
I would maybe generalize it to "equipement used" over time (in line with the way we use made from material (P186)) to also include lenses etc. I'll write a little bot to import this data at least for the camera's. Multichill (talk) 09:52, 29 August 2020 (UTC)Reply[reply]
On second thought I added it to the regular bot example edit. Multichill (talk) 15:12, 29 August 2020 (UTC)Reply[reply]
Multichill, could you please add a short summary of your mappings to for example: Commons:Structured data/Modeling/Camera or maybe Commons:Structured data/Modeling/Equipment. Otherwise this discussion will be forgotten at some point. --Schlurcher (talk) 19:33, 29 August 2020 (UTC)Reply[reply]
I also have User:BotAdventures, that populates the categories. Switching to the property would take a bit of thought, because most device models won't have items in Wikidata, and it would end up editing about half the files uploaded to Commons. The technical question is where you store the mapping from Exif identifiers to Wikidata QIDs: the bot generally finds a few new device models every day, and I create categories for them. --ghouston (talk) 23:06, 29 August 2020 (UTC)Reply[reply]
@Ghouston: On Wikidata of course. Multichill (talk) 23:28, 29 August 2020 (UTC)Reply[reply]
The exif name and model fields? It possible if the constraint was removed to allow multiple values. My own table contains things like
samsung,SM-J700F,Samsung Galaxy J7 (2015)
samsung,SM-J700H,Samsung Galaxy J7 (2015)
samsung,SM-J700K,Samsung Galaxy J7 (2015)
samsung,SM-J700M,Samsung Galaxy J7 (2015)
samsung,SM-J700P,Samsung Galaxy J7 (2015)
samsung,SM-J700T,Samsung Galaxy J7 (2015)
samsung,SM-J700T1,Samsung Galaxy J7 (2015)
Samsung,Galaxy J7,Samsung Galaxy J7 (2015)
,Samsung J7,Samsung Galaxy J7 (2015)

(the last two actually look questionable). I'm also a bit worried that it would make a big mess if the values were vandalised. There are also silly things like

Canon,canon EOS 1100D,Canon EOS 1100D
Canon,Canon  EOS 1100D,Canon EOS 1100D
,Canon EOS 1100D,Canon EOS 1100D
Canon,EOS 1100D,Canon EOS 1100D
Canon,1100D,Canon EOS 1100D
CANON,1100D,Canon EOS 1100D

which are presumably output by various pieces of broken software. --ghouston (talk) 00:25, 30 August 2020 (UTC)Reply[reply]

 Request Can we please do this in a way that is not limited to documenting the camera used? "Equipment used" seems to make sense: Lenses and scanners have been mentioned already as relevant meta data, here are some more:

Thanks, --El Grafo (talk) 10:40, 8 September 2020 (UTC)Reply[reply]

  • Mobile phones aren't cameras either. These days they almost always contain one or more cameras though. --ghouston (talk) 01:04, 9 September 2020 (UTC)Reply[reply]
  • I suggested expanding the scope at d:Property_talk:P4082. Given how few people were involved in the property proposal, I suppose it won't be a problem. --ghouston (talk) 01:46, 9 September 2020 (UTC)Reply[reply]
Based on my reading of the discussion, I think there are several proposed scope expansions here:
  1. to more type of “image capture-making devices” (phone, scanner, microscope)
  2. to more media types (video, audio) − so more like “media captured with”
  3. to “sub-components” of the equipment (eg. lenses)
  4. to the 'producer' of the original expression rather than to the capture − to reuse El Grafo’s example, the media file itself File:Epro theremin middle bach.ogg was not 'captured' with a theremin (Q207691) − the underlying 'object' was.
I think #1, #2 and #3 are perfectly fine scope expansions.
(For #3 would it then make sense to have a qualifier system, using things like applies to part (P518) or object has role (P3831)? Or can we just discriminate on the type of object?)
For #4 I’m not so sure. I think it belongs to a separate property − maybe depicts (P180) with some qualifier.
Jean-Fred (talk) 09:29, 9 September 2020 (UTC)Reply[reply]
For the record, this has been implemented by now, for smartphones and the like (but not for instruments etc. which indeed makes sense). See also d:Property talk:P4082 El Grafo (talk) 12:46, 31 August 2022 (UTC)Reply[reply]

Most common property statements[edit]

As far as I see, the Structured data tab allows you to add any property that you like. As such, I am trying to understand what is actually used and if some of the less frequently used qualifiers can or should be merged together. From the Commons Query Service, I found [1], which gives Most common property + qualifier combinations, other than on "depicts" (P180) statements together with an example. That's a great starting point, but even too granular for what I am looking at. Can someone teach me how to get to Most common property statements, ideally also with an example. --Schlurcher (talk) 11:54, 9 September 2020 (UTC)Reply[reply]

Three different image-related text tasks in structured data[edit]

(moving this here from d:Wikidata:Project chat) I was looking at an image that had a visible caption in it and wanted to add that text as structured data. So I tried adding a media legend (P2096) statement, but got a warning that the property should only be used as a qualifier. Then I tried adding depicts: text (Q234460) instead with a media legend qualifier containing the text in the image, but got a warning that P2096 isn't included in the allowed qualifiers constraint (Q21510851) of depicts (P180) even though media legend says that the "subject item of this property" is "depiction".

But thinking about it, for most images on Commons the caption would have been cropped out and would be purely metadata, so in those instances attaching it through depicts:text would not seem to be appropriate.

Then a third case I'm considering is text that's in an image but is not a caption, rather is part of the image. For images of certain objects there's inscription (P1684), for text inscribed on an object being depicted; there's identified in image by (P7380) for multiple items depicted in an image by different callout labels or similar text indicators; but I'm not seeing a property that would generally correspond to text within the image itself, or text on a road sign or printed label or something like that.

So how would I go about doing each of these three things? Or should I just not do one or more of them? Thanks, --truthious andersnatch 23:22, 4 October 2020 (UTC)Reply[reply]


Hi All,

Could you please share your thoughts on this property proposal? I think having a property like this could make searching and categorising images somewhat easier. --Adam Harangozó (talk) 22:34, 20 October 2020 (UTC)Reply[reply]

How to model interior and exterior views[edit]

Is there a proposal how to model interior or exterior views of an object?

Many categories of buildings and other things are divided into interior and exterior views, and it would be helpful to have this differentiation also on structured data. There are properties and items such as image of interior (P5775), interior (Q2998430) and exterior (Q1385033), but there should be a consensu (and documentation) of how to use them. -- H005 12:01, 25 November 2020 (UTC)Reply[reply]

@H005: One possibility could be use Wikidata item of this property (P1629) -> interior (Q2998430) pairs like it is done here category for the interior of the item (P7561) --Zache (talk) 05:37, 23 January 2021 (UTC)Reply[reply]

Audio recordings[edit]

I'm uploading audio files that contains pronunciations of words. What would I set P3831 (author has role) in this case? Maybe narrator (P2438)? MartinMichlmayr (talk)

What exactly is "depicts"?[edit]

Am I correct that this edit by User:Brazal.dang is an incorrect use of "depicts"? - Jmabel ! talk 19:57, 11 March 2021 (UTC)Reply[reply]

Everything seems to be depicts. I'd like to say things are eg videos. Should I be using depicts, or something else? Secretlondon (talk) 18:05, 26 July 2021 (UTC)Reply[reply]
You might be interested in reading Commons:Structured data/Modeling/Depiction and Commons:Depicts. You can also try starting a topic in the Discussion page. Rdrg109 (talk) 06:21, 15 December 2021 (UTC)Reply[reply]

Related images[edit]

We like to link images together for example when we crop pictures or when a similar picture exists (this topic reminded me of that). I propose we start using related image (P6802) for linking these. We can use qualifiers to indicate the type of relation. Still have to figure out what qualifiers we actually need. What do you think? Multichill (talk) 19:39, 4 June 2021 (UTC)Reply[reply]

Sounds reasonable to me. Jean-Fred (talk) 11:35, 7 June 2021 (UTC)Reply[reply]
Ok. I started Commons:Structured data/Modeling/Related file. Multichill (talk) 19:56, 29 June 2021 (UTC)Reply[reply]

New subpages[edit]

Two new Modeling subpages have been added:

- PKM (talk) 22:18, 5 June 2021 (UTC)Reply[reply]

Modeling for DPLA's uploads[edit]

Our DPLA project has so far uploaded over 2 million files to Commons, so far without including any SDC. One of the main goals of our recently funded WMF project grant is to add SDC statements to these uploads. As part of the process, I have created Commons:Digital Public Library of America/Modeling as a place to discuss and document our SDC data model. My goal here is to ensure that however we model the data is in accordance with community wishes, especially since the plan is to eventually maintain this data model across millions of files that will be regularly synced from the data source. And I am also guessing that having this modeling document will be necessary for securing the bot approval to do this work, so the bot is only doing expected edits. I have started off the page with a few of the more simple types of statements, and we would appreciate your help in giving input on specific types of statements already listed (even if it is to agree, so we have consensus), proposing additional statements not yet covered, etc. Dominic (talk) 15:43, 21 June 2021 (UTC)Reply[reply]

Book illustrations via upload tool[edit]

Cross-posting from COM:VPT: I have a tool that I'm working on,, that is intended to make it easier to upload illustrations of various kinds from books at Wikisource and providing better metadata at the same time.

Some basic metadata is added by the tool as a description file info text + categories, but it seems like the tool should also be adding some SDC fields if it can. What data should the various kinds of images it support have:

  • Illustration
  • Photograph
  • Headpiece/endpiece
  • Fleurons
  • Illuminated initials
  • Publisher logos
  • Headings and titles in decorative fonts
  • Periodical mastheads

Looking forward to being able to more usefully upload structured data, but I honestly have no idea what I'm supposed to be doing here! Inductiveload (talk) 21:36, 29 June 2021 (UTC)Reply[reply]

Listed buildings and wikidata properties[edit]

I've been going through my old uploads adding structured data. Most things seem to be added by bot, it's only really depicts that is added by humans. A lot of my photographs are of listed buildings and have the template {{Listed building England |1=1392740}}. Can this number be added as structured data too?

Wikidata has Cadw Building ID (P1459), National Heritage List for England number (P1216) and Historic Scotland ID (P709). Can we use these properties in commons? Secretlondon (talk) 18:20, 26 July 2021 (UTC)Reply[reply]

@Secretlondon: these should probably not used directly, but instead depicts (P180) should be used to link what is shown in the image. My robot has been doing that for a while and I fired it up again. The current list of supported types is at User:ErfgoedBot/Depicts monuments.js. I recall something problematic with Scotland probably that multiple templates here match with one property, see for example on Blackness Castle (Q1524366). Multichill (talk) 18:45, 27 July 2021 (UTC)Reply[reply]
Does this mean that every listed building should have a wikidata item, and that should have the list number? Secretlondon (talk) 19:56, 27 July 2021 (UTC)Reply[reply]
Yup. We've been doing Wiki Loves Monuments for over 10 year now and we gathered data for a lot of monuments.
Most of these are on Wikidata these days. For the different kind of Listed Buildings in the UK, this should be about complete. Multichill (talk) 20:40, 27 July 2021 (UTC)Reply[reply]
Thanks. I need to find some :) Secretlondon (talk) 21:45, 27 July 2021 (UTC)Reply[reply]
Nearly 400.000 and the Wikidata id for 1392740. Multichill (talk) 16:37, 28 July 2021 (UTC)Reply[reply]

SDC for postcards[edit]

Hello, I need an advise. How should we adding SDC to postcards? For example we want request later all postcards of a city or region. Or all postcards of animals or only elephants. I am not sure about how to start, so I ask here. We have at the moment:

I will create two new wikidata items for "adress side" and "picture side" of a postcards. So we can also set this to "depict". - Or ist this all wrong? Should we add the metadata in another way? Please give me an advise. -- sk (talk) 10:34, 4 November 2021 (UTC)Reply[reply]

I found also Images with "digital representation" of picture postcard. So what is the best way to set the SDC for picture postcards? Should we use P31, P180 or P6243 ? -- sk (talk) 17:42, 4 November 2021 (UTC)Reply[reply]
P31 seems most natural, as is is used for this kind of meta information. I would use P180 and P6243 for the information on the postcard. --Schlurcher (talk) 11:38, 7 November 2021 (UTC)Reply[reply]
Using digital representation of (P6243) like that is incorrect. The target should always be an instance, not a class. Multichill (talk) 16:41, 28 December 2021 (UTC)Reply[reply]

SDC for recordings of talks in conferences[edit]

I would like to add structured data to the talks of WikidataCon 2017 and EmacsConf 2021. I'm now wondering if there are some guidelines that are focused on this.

These are some of the statements that I think would be a good idea to have

However, I don't know how to use instance of (P31) and depicts (P180) in this scenario. Any help is appreciated.

Rdrg109 (talk) 22:37, 15 December 2021 (UTC)Reply[reply]

Apparently, this topic hasn't been addressed before, so I have created a new page for modeling conference talks. You can find it at this URL. Rdrg109 (talk) 22:38, 16 December 2021 (UTC)Reply[reply]


Now that Commons has references in SDC, I think we should discuss modeling for references, and probably make a new subpage with examples. The most situation is likely where a file and its metadata comes from an institution catalog, and that catalog record is the reference for the SDC statements. In this case, I imagine we use reference URL (P854), in combination with retrieved (P813). And maybe publisher (P123) for the institution's name(?). Here is an example of that format.How does this all sound? Dominic (talk) 20:11, 12 January 2022 (UTC)Reply[reply]

Importing all imageinfo data into structured data[edit]

What do others think about importing all imageinfo data into structured data? I would be especially interested in MIME type and dimensions, but probably other like metadata and commonmetadata should be included as well? Mitar (talk) 14:58, 2 February 2022 (UTC)Reply[reply]

Thanks for bringing this up. This was also mentioned to some degree in Commons:Structured data/Properties table. I've added some, like the EXIF-like data to my bots routine. I'm generally fine to add more if there is interest. Could I ask you to make a proposal at Commons:Structured data/Modeling/Meta how you think this should be modelled? For MIME type, for example, do we want this to be mapped to a string or a wikidata item representing the string. --Schlurcher (talk) 06:58, 3 February 2022 (UTC)Reply[reply]
Looking more into this, the regex specification for MIME type (P1163) suggest that is has to be a string and not a wikidata item. So, I've added your proposal to Commons:Structured data/Modeling/Meta. Unfortunately, the automatic url formating does not work on Commons, but that might also come eventually. --Schlurcher (talk) 15:55, 5 February 2022 (UTC)Reply[reply]
Awesome! Thanks. Yes, I think having a string is fine. Mitar (talk) 01:01, 28 February 2022 (UTC)Reply[reply]
I’ve just seen SchlurcherBot add MIME type (P1163) to one of my images (diff). To me, this seems completely pointless – the MIME type is already part of the RDF generated by WikibaseMediaInfo (look for schema:encodingFormat in Special:EntityData/M80857574.rdf); duplicating it in the statements just means that it can be vandalized or go out of sync. Can we please not do this? Lucas Werkmeister (talk) 22:51, 24 February 2022 (UTC)Reply[reply]
Very interesting. Some documentation about this feature seems to exist here. But it seems this is not really available through data available in dumps? Nor does it use Wikidata properties (but it seems ones). So not sure how useful this is or what to do about it? Mitar (talk) 01:15, 28 February 2022 (UTC)Reply[reply]
@Mitar I assume it’s available in the RDF dumps, that’s where the query service would get it from. If it’s not in the JSON dumps yet, it should be possible to add it there if someone who wants it opens a Phabricator task. Lucas Werkmeister (talk) 19:16, 7 March 2022 (UTC)Reply[reply]
I checked the RDF dump for M80857574 and it does contain and others. So it would be cool to expose this through JSON as well, but are you sure this is doable? I mean, I have not seen anything like that on Wikidata? All claims there are those stored in Wikibase itself, not through services. So I worry that we can open a Phabricator task but it will be forever before this is implemented? Mitar (talk) 15:10, 11 March 2022 (UTC)Reply[reply]
I've made phab:T303629. - Nikki (talk) 17:24, 11 March 2022 (UTC)Reply[reply]
@Mitar You know that the query service can access a lot more than just Wikidata properties, right? Just check this query that is in the examples. Ainali (talk) 19:29, 7 March 2022 (UTC)Reply[reply]
I know. But I want to access data in bulk. So querying the service manually would not work out for me. Mitar (talk) 15:11, 11 March 2022 (UTC)Reply[reply]
Since I wasn't entirely clear in my last reply, I'm also opposed to the bot doing this. The statements are redundant and there are multiple ways to query for that data if you want it. - Nikki (talk) 21:44, 29 August 2022 (UTC)Reply[reply]
I added few more to the list. Mitar (talk) 01:22, 28 February 2022 (UTC)Reply[reply]
Just import it. If we have to wait for features it will take years. Multichill (talk) 09:46, 13 March 2022 (UTC)Reply[reply]

Stop importing imageinfo data into structured data[edit]

As I wrote on Schlurcher's talk page (User talk:Schlurcher/Archive 3#Bot_adding_redundant_statements), there is no consensus that I'm aware of for it (see the above discussion) so SchlurcherBot should stop doing it. - Nikki (talk) 21:39, 29 August 2022 (UTC)Reply[reply]

@Schlurcher: - Can you please respond to this issue, as the reply you had before was to point to Commons:Structured data/Modeling/Meta, which has no information at all about the bot you are operating, and under what logic. Thanks. - Fuzheado (talk) 14:24, 19 November 2022 (UTC)Reply[reply]
Appologies for the late response. The bot is transferring data to Commons:Structured Data as per the logic described in Commons:Structured data/Modeling and sub-pages, which is what the bot is approved for, see Commons:Bots/Requests/SchlurcherBot8. So, I have referred to the corresponding page (Commons:Structured data/Modeling/Meta) regarding the logic --Schlurcher (talk) 17:38, 18 March 2023 (UTC)Reply[reply]

how to mark different types of photographs (e.g. aerial photographs)?[edit]

We have quite a bunch of aerial photographs, and I think it would be good if people could specifically filter for them. Here it was suggested to use genre (P136) = aerial photography (Q191839) and that's what I've done for my own pictures now. But while I was disentangling aerial photograph (Q113670888) from aerial shot in cimematography (Q4688031), I thought maybe something like has quality (P1552) = aerial photograph (Q113670888) would be reasonable too. Maybe even instance of (P31) = aerial photograph (Q113670888)? @Jmabel & Multichill: FYI. El Grafo (talk) 09:29, 1 September 2022 (UTC)Reply[reply]

Modelling maps in SDC[edit]


There is a discussion open for Commons:Structured data/Modeling/Maps. Months ago I made a SDC data upload for a set of >9K maps from IGN. To do it I have the support of many people to identify the best way about data modelling for maps in SDC. The results are proposed as a good practice. Maybe you'll want to check it. Hope it helps. —Ismael Olea (talk) 18:26, 21 December 2022 (UTC)Reply[reply]

@Olea Thanks for that, looks good to me! Info on the spatial reference is missing, though. Minimum should be spatial reference system (P3037) for modern maps, and EPSG CRS (P1338) is always a good idea. El Grafo (talk) 08:56, 28 September 2023 (UTC)Reply[reply]

Mobile app uploads[edit]

I would like to be able to find files contributed using the Android mobile app as part of a structured data query (and intersect with locations). Is there a property to be used for this purpose? Should we use properties at all for this purpose? If the data is not populated (I believe so), I think I can do that based on an existing category, but I don't know which property to use.

Note this is about device used in uploading, not creating. The device used to upload the file might or might not be the device used to create the file. I think this can be considered as part of data provenance. whym (talk) 13:01, 30 December 2022 (UTC)Reply[reply]

As a use case, (a map showing where the Android app have a higher proportion of uploads than other upload tools) would be easier to create with a property showing upload device (rather than a MediaWiki category). whym (talk) 10:29, 11 April 2023 (UTC)Reply[reply]

Property for picture composition[edit]

I'm looking for a good property for picture compositions. IMO it would be good to add these kind of information to the structured data of an photograph - with an additional education purpose. Photographs for these property can be found in Category:Composition (Visual arts) (composition (Q462437)) or Category:Picture composition. Any idea? --XRay 💬 07:27, 15 April 2023 (UTC)Reply[reply]

Could be useful but we also could use something like instance of (P31) panorama (Q41363)/worm's-eye view (Q16966122) GPSLeo (talk) 08:21, 15 April 2023 (UTC)Reply[reply]
Hmm. instance of (P31) is IMO not a good solution for something like golden ratio (Q41690), Rule of thirds (Q327905) or vignetting (Q1136665). --XRay 💬 09:03, 15 April 2023 (UTC)Reply[reply]
form of creative work (P7937) may be better - better, but not good. --XRay 💬 10:54, 15 April 2023 (UTC)Reply[reply]
Nice, found 31 Pages with haswbstatement:P7937. (Some of them have awfully depict statements.) --XRay 💬 10:59, 15 April 2023 (UTC)Reply[reply]

Q-item for photographer[edit]

Is there any way we should model that the Flickr photographer credited for File:Chad Griffin December 2010 Cropped.jpg and File:Juvenile Gull - spread wings 2.jpg among others is Matt Baume (Q108841379)? - Jmabel ! talk 20:00, 3 July 2023 (UTC)Reply[reply]

Maybe this one? Strakhov (talk) 17:41, 8 July 2023 (UTC)Reply[reply]