Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days.

Talk pages of subpages and archives

Why are we using properties differently in structured data than in wikidata?[edit]

Can anybody explain to me why the properties depicts (P180) and main subject (P921) are treated differently in the description of the image on Commons and its Wikidataentry Maria Salome und Maria Cleophae (Q109708676) itself? The Structured Data entry was initated by bot. --Wuselig (talk) 16:36, 3 January 2022 (UTC)[reply]

The statements here describe the file File:Alte Meister in der Sammlung Würth-Kat.-57b.jpg. In the file Maria Salome und Maria Cleophae (Q109708676) is depicted and the main subject of the file is Maria Salome und Maria Cleophae (Q109708676).
The statements on Wikidata describe the artwork Maria Salome und Maria Cleophae (Q109708676), so the depicts (P180) & main subject (P921) are about what we see in the painting. Multichill (talk) 22:12, 3 January 2022 (UTC)[reply]
Thanks! I have fallen into that trap before. But Games like Suggested tags always lead me back into this trap. Because there we would also add as
depicts (P180)
ornament (Q335261), headgear (Q14952), game (Q11410), overcoat (Q337481), gold (Q208045), flower (Q506)
besides
mother (Q7560), child (Q7569), aunt (Q76507), son (Q177232), sister (Q595094), brother (Q10861465), cousin (Q23009870)
and of course
Salome (Q233067), Mary of Clopas (Q235377),St. James the Elder, Apostle (Q43999),John the Apostle (Q44015), James the Less (Q3245490), Simon the Zealot (Q12871), Jude the Apostle (Q43945), James, son of Alphaeus (Q44047),
because we see all this, and much more on this image.
Or am I again mixing things up?
I fell into the trap, because of the singular addition of the refernce to the artwork as the depicts (P180) of the image, while omitting all the others. Wuselig (talk) 11:19, 4 January 2022 (UTC)[reply]
hilft das?Oursana (talk) 11:32, 4 January 2022 (UTC)[reply]
Answering in English, because the page is in English:
Yes and no! Because it doesn't real give a sense of where to really stop. If we look at the Commons Categories like Category:Nude or partially nude women facing right and looking left in art some users here like to go down to the last details. And I think Structured Commons was created to bring back images that have been burried deep down in such Category holes to the surface and make them findable again. With tags that is easier than having to follow all the branches of a category tree to an individual leaf. But how do we convey this to the common users, if even I as an old horse repeatatly fall into the trap? Wuselig (talk) 11:54, 4 January 2022 (UTC)[reply]
Commons:Structured_data/Modeling/Depiction Oursana (talk) 18:11, 4 January 2022 (UTC)[reply]
After thinking about this for a while, I believe that, unlike creator or inception, “depicts” is transitive (at least for visual works). Thus photo of statue –(depicts)-> statue of Ganesha –(depicts)-> deity Ganesha entails also that photo —(depicts)–> Ganesha. There is some cutoff point though, where the chained depiction needs to be sufficiently prominent, so that a photo of a museum interior depicts the room and the contained artworks but not the figures within the artworks. Pelagic (talk) 12:53, 4 March 2022 (UTC)[reply]

automatical categories[edit]

File:Aerial Heilandskirche Sacrow.jpg

Though I deleted the wikidatalink, this file still gets an automated cat which all files get, which I linked to wikidata. So how can I delete the red link Category:Artworks digital representation of church building which does not fit here and should not go with all (even formerly) wikidata linked photos--Oursana (talk) 11:59, 4 January 2022 (UTC)[reply]

Follow up question: Why do we get red-linked categories at all? Shouldn't the programmer of such a feature create the categories that such images are to be moved into beforehand?
Here too, for a different image and category: File:Alte Meister in der Sammlung Würth-Kat.-57b.jpg which links to Category:Artworks digital representation of panel painting --Wuselig (talk) 12:54, 4 January 2022 (UTC)[reply]

How to model chromatic aberration (Q1087688) on a photo?[edit]

Андреев Макар макросъемка капрона.jpg
Chromatic aberration lens diagram.svg

Both File:Андреев Макар макросъемка капрона.jpg and File:Chromatic aberration lens diagram.svg have depicts (P180) chromatic aberration (Q1087688) statements. It's definitely not the same kind of "depicting" and probably the statement on File:Андреев Макар макросъемка капрона.jpg should be removed? But how to express the existing of chromatic aberration (Q1087688) on it then? Do we have a property for that? Thanks in advance, --Marsupium (talk) 07:50, 14 January 2022 (UTC)[reply]

@Marsupium: The has quality (P1552) property feels relevant to me here, though I’m not sure where I would put it – as a main statement, or as a qualifier on another statement… Lucas Werkmeister (talk) 00:03, 23 March 2022 (UTC)[reply]

Commons entities dump - how to know for which file it is?[edit]

I have been looking at the Commons entities mediainfo JSON dump and while it is great that it exists, I wonder how to know which entity corresponds to which file. In the dump, JSON lacks title field, only id field is present. But that seems to be independent from any other identifier. Should we have a statement which provides a link to the file (using commonsMedia data type)? Or should we use a top-level sitelinks property to make a link? Mitar (talk) 00:28, 4 February 2022 (UTC)[reply]

Hi, maybe you can explain your use case to help us understand what you are trying to achieve. For example, you can easily generate a link to the file with the entity concept URI: https://commons.wikimedia.org/entity/M41304415 --Schlurcher (talk) 08:26, 4 February 2022 (UTC)[reply]
Note that the numerical part of the entity ID is always identical to the MediaWiki page_id of the page it is associated with. —MisterSynergy (talk) 09:56, 4 February 2022 (UTC)[reply]
@Mitar: the title like File:Entrance by the Harrow Way - geograph.org.uk - 3185722.jpg on Special:EntityData/M115001143.json is missing in the json dump? That sounds like a bug to me. Multichill (talk) 11:19, 6 February 2022 (UTC)[reply]
Yes, entities in the dump do not have a title field. So my use case is that I start with the dump and I would like to know for which file each entity in the dump is. Without the title field (which is returned from the special page, but is not in the dump) this seems ... hard? Or more hard then necessary.
Should I open a bug?
I will look page table dump to see if I can find a mapping that way, but it is definitely more work than it should be. Mitar (talk) 18:48, 6 February 2022 (UTC)[reply]
@Mitar: yes, please do in Phabricator. I found the dump on Toolforge and had a look
tools.multichill@tools-sgebastion-08:/mnt/nfs/dumps-labstore1006.wikimedia.org/commonswiki/entities$ zcat latest-mediainfo.json.gz | head -2
This returns Special:EntityData/M76.json minus some of the fields like title. Multichill (talk) 20:06, 6 February 2022 (UTC)[reply]
I made phab:T301104. Mitar (talk) 10:04, 7 February 2022 (UTC)[reply]
There is no progress on this on Phabricator. I think this is a really big limitation of JSON dump. How could this be pushed forward? Mitar (talk) 11:13, 27 April 2022 (UTC)[reply]
Wikimedia Hackathon is coming in May. This could be a good place for pushing forward as it is likely that there are people with knowledge participating and this is something tangible and small enough to fix in the hackathon time. --Zache (talk) 17:39, 27 April 2022 (UTC)[reply]
Great idea. I tagged it to be added. Not sure if I should also make a session for it or something. Or how to get other people involved to guide me a bit (e.g., where is the code relevant to this). Mitar (talk) 12:08, 28 April 2022 (UTC)[reply]
I have done it during the Hackathon and the improvement to the dump script has been merged. This now has to be deployed and used for dump generation. API output will not match dump output. Mitar (talk) 12:27, 27 May 2022 (UTC)[reply]
Thank you for pushing this forward. -- Zache (talk) 14:28, 27 May 2022 (UTC)[reply]

Structured data not visible in structured data tab on a lot of files[edit]

Not sure if anyone already filed a ticket for it, but I filed phab:T301048 for this. Multichill (talk) 21:11, 5 February 2022 (UTC)[reply]

Pretty much every my upload where I wanted to amend the structural data in the last month had this issue. Thanks for filing.--Ymblanter (talk) 21:31, 7 February 2022 (UTC)[reply]
I have noticed this when added structured data with AC DC (Help:Gadget-ACDC). I added the structed data is not longer visible. I added it to the phab discussion. Caddyshack01 (talk) 15:20, 25 April 2022 (UTC)[reply]

Slide duplicates[edit]

Recently I've added a lot of images from my slide archives dating to the 1970s by photographing them with a digital camera in a copy stand, so that they have the nominal property of being created with a Canon 7D II, but they're really slide duplicates. What kind of property or qualifier would be an appropriate annotation to indicate that the original was not in fact generated with a digital camera 40 years before the camera existed? An example: File:F-4N Phantom 150475 of VF-201 NAS Atlanta 1978 GA1.jpg Acroterion (talk) 18:50, 20 February 2022 (UTC)[reply]

That's a really good-looking conversion, @Acroterion! Using a macro lens instead of a slide scanner hadn't occurred to me, what do you use for back-lighting? Back on-topic, the only solution I can think of is somewhat kludgy and I'll be interested to see others' suggestions. Could you set digital representation of (P6243) = unknown/somevalue then add qualifiers to that (inception, author, captured with (P4082), instance of (P31), etc.? Strictly speaking qualifiers apply to the whole statement, not the object of the statement, which is why I feel a little uneasy about that approach. Pelagic (talk) 07:42, 5 March 2022 (UTC)[reply]
That was why I asked, every way I thought of doing it was kludgy. For that matter many of the same objections exist if a dedicated film scanner is used, it's just a little more obvious that it was a scan. Maybe some new properties should be set up to cover scans and duplications in some degree of variety?
I use a Canon EF100 2.8 L macro lens, with an extension tube to allow close focus, stopped down to f/10. I tried other cameras, including a Canon 5Ds, but I found to my surprise that the 7DII gave the best detail and could resolve the grain. The 5Ds was almost as good, but the extra resolution was not needed and just made the file size bigger. The 7DII is more of a sports-and-birds camera, so I wasn't expecting that. I used a 35mm slide/film holder on a Skier Pro Sunray LED light box. This setup is an order of magnitude faster than scanning, and the results were equivalent, and without the hassle of dealing with Kodachrome-versus digtal ICE conflicts. It does, however, show every speck of dust, which has to be removed in post-processing, which a scanner would mostly remove. Acroterion (talk) 14:20, 5 March 2022 (UTC)[reply]

New Lua module to access a single statement or qualifier[edit]

Hi all! I’d like to present a new Lua module I’ve created to access the structured data of a file (or a Wikidata item), since none of the existing modules did what I was looking for: Module:Statement. It’s intended for cases where you expect a statement to only be present once, and it returns the value as plainly as possible (in particular, for item ID values it gives you the item ID itself, not a link or an unlinked label like Module:WikidataIB or Module:Wikidata2 would). It can be used directly from wikitext (in a template), or from another module; using it in a module is likely to be more efficient, but you may find template usage easier to work with.

It’s motivated by my recent work on {{Lingua Libre record}} together with Nikki, where we wanted to use the value of three properties, each of which is only expected to be present once: language of work or name (P407), audio transcription (P9533), and Lingua Libre ID (P10369). (We also needed the ISO 639-3 code (P220) of the language of work or name (P407), which is why the modules that returned the language as a labelled link were useless to us.) Module:Lingua Libre record doesn’t currently use Module:Statement (since I wrote the specific module before the general-purpose one), but if Module:Statement works out, the other one should probably use it sooner or later.

This module isn’t intended to satisfy all possible use cases for structured data; in particular, it’s not very well suited for properties where you’d expect multiple values, like first and foremost depicts (P180). (I am thinking of creating a second module that selects a single statement based on a single qualifier, but I should probably wait a bit with that, do one thing at a time.) That said, I hope it can still be useful for other templates; Dominic has already been experimenting with it, and I thought it’s ready for some wider usage, hence this post :) Lucas Werkmeister (talk) 20:43, 23 March 2022 (UTC)[reply]

Sequence order for a set of files[edit]

I am trying to determine the best way to add the order for a sequence of related uploads (i.e. pagination). Often, a single conceptual item is uploaded across multiple files. Consider a case where each page of one document is a separate JPG scan. They may come from an institution and all have the same identifier and other descriptive metadata. Here is an example of a case like that: https://w.wiki/4xGX

In the uploads, I have distinguished the files by using "(page X)" in the file name, incrementing the page number for each. But this is not reflected in SDC at this time; all of their structured data is the same. This is not the same concept as page(s) (P304), which would properly refer to the page (or total pages) of the of the depicted work, not necessarily of sequence of uploads. Also, not all sequential uploads from a single work are in a format to be referred to as "pages" anyway. It is also not file page (P7668), which is the opposite of this, in a way—for giving the page number within a referenced file, rather than the number of a file within a given set of files. We discussed this on Telegram, but there were not many existing examples or satisfactory ideas. I can think of a way to do this (which is modeled on the way we do creators in a creator (P170) statement with 'somevalue and author name string (P2093) qualifier):

part of the series
Normal rank somevalue
series ordinal 18
number of parts of this work 36
DPLA ID 53a76c6e0d5e25e79a30ffd71c471e75
0 references
add reference


add value

I am using this approach because we are saying the file is part of a series, though the series does not have a Wikidata item, so uses somevalue and the series is being defined instead with the institutional item identifier in the qualifier. This allows us to use "series ordinal" for the page number, plus "number of parts of this work" for the total.

Is this a viable approach? Are there other ideas? Or should we just propose a new property for this that would allow us to apply the value to the file at the top level instead of in a qualifier, as it is somewhat complicated this way? Dominic (talk) 22:31, 25 March 2022 (UTC)[reply]

Mark as prominent outside of depicts[edit]

It appears that we can "Mark as prominent" statements other than depicts. Was this always the case? Are there any instances where this would actually make sense to do? the wub "?!" 08:53, 18 April 2022 (UTC)[reply]

Confused about 'allowed qualifiers constraint'[edit]

Hello,

I am new to structured data and making a concerted effort to include it on uploads, but I am often confused by where the 'allowed qualifiers constraint' that comes up. For example, this image depicts a light tower, but using depicts for that apparently results in a potential issue: https://commons.wikimedia.org/wiki/File:Penn-North_Light_Tower_by_Woodbrook_Ave.jpg

If I see warnings like this, is it an error with how I added the structure data, or could it be an error with how the item being depicted is defined? --Middle river exports (talk) 17:00, 29 April 2022 (UTC)[reply]

Something is broken with SDC[edit]

When i added a new statements (example: File:Shark_antwerp_zoo.jpg) the structured data tab is empty. If i select the old revision of the file from the history the statements are there. Only latest wersion doesn't work. --Zache (talk) 08:17, 1 May 2022 (UTC)[reply]

I have also experienced SD disappear with files edited in April, like with this image. Premeditated (talk) 09:02, 1 May 2022 (UTC)[reply]
#Structured data not visible in structured data tab on a lot of files. Multichill (talk) 10:21, 1 May 2022 (UTC)[reply]

Structured data for files on Wikipedia[edit]

So this project is adding structured data for files on Wikimedia Commons. What about files on Wikipedia itself, those which cannot be moved to Wikimedia Commons because of copyright reasons? Are there any plans to populate metadata there as well? I would especially be interested in having license information for those files as structured data. Mitar (talk) 09:19, 3 May 2022 (UTC)[reply]

SDC property P1259 can neither be changed nor deleted[edit]

Please see Commons talk:Structured data/Reconciliation#Coordinates of the position cannot be changed. I just tested this again with File:Unterstmatt - Ochsenstall - panoramio.jpg. While I was able to delete (for test purposes) copyright license (P275), editing or deleting the coordinates was impossible, as there was either no "Publish changes" button at all or it wasn't active. The workaround I found was to change my language setting either from German to English of vice versa. Immediately afterwards, the buttons worked as intended and I was able to publish the change. --Sitacuisses (talk) 19:51, 23 May 2022 (UTC)[reply]

Seemed to work with me but [1]. However, only after my test did I read the discussion on the reconciliation talk page. It seems that there is some kind of underlying problem which cases that it may or may not work. -- Zache (talk) 07:36, 24 May 2022 (UTC)[reply]