Commons talk:Structured data/Get involved/Feedback requests/Depicts

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Welcome to the discussion page for the Depicts specification. Please leave your feedback below. Keegan (WMF) (talk) 18:03, 7 August 2018 (UTC)

Non photographic and non artworks documents[edit]

This first version of "depicts" would work for photographic/artwork and 2D graphical illustrations, which probably represents most of the media Commons files. But we'll need to take into account the following media files, for later developpements:

Djiboun (talk) 22:09, 7 August 2018 (UTC)

@Keegan: thanks for the link. The request for interesting Common files completes the subject. Djiboun (talk) 07:10, 8 August 2018 (UTC)

Represent each thing only once[edit]

On slides 9/10 the two persons on "Photo: Bearded man with girl wearing a blue dress" are represented with the statements <depicts (P180)> <man> and <depicts (P180)> <girl>; additionally there is a statement <depicts (P180)> <human> with qualifier <quantity (P1114)> <2>, which represents them a second time. This statement shouldn't be added in my eyes each thing on an image should be represented only once as a value of a depicts (P180) statement. Here it can be inferred from the other statements that there are two humans on the image. I know there are cases of this on Wikidata, even Mona Lisa (Q12418), even enumerations of values like <Mary>, <virgin>, <mother>, <woman>, <girl>, <human> all referring to the same person. Unfortunately this practice prevents us from knowing if there are six or one thing depicted.
In other cases not all those classes can be inferred from the item used as value. For those I propose to add the link to the item as a qualifier, e.g. a instance of (P31) or shown with features (P1354) qualifier rather than a new statement. If even more is needed an item can be created for the thing depicted, like person depicted in Mona Lisa (Q11879536). Thanks, --Marsupium (talk) 09:03, 8 August 2018 (UTC)

I agree, I've also noticed that duplication when seeing the slideshow. Apparently it would be possible to avoid the "2 humans" statement.-- Darwin Ahoy! 16:42, 8 August 2018 (UTC)

Over-use of P180 "depicts" ?[edit]

I can see that, from a social point of view, it makes a lot of sense to introduce structured slowly, with just P180 at first, giving people time to get used to that, and see what issues it may or may not create, rather than a "big bang" approach of allowing any property on a CommonsData item right from the outset, even if technically there's no real difference in difficulty between the two.

But, making P180 the only property available at the start, does create a possible temptation to try to do too much with that property, if there are some statements that it might be better to express using a different allied property instead.

I am not saying the approach by the team in the slides is necessarily wrong; but there are a couple of areas that I wonder whether we should think about, and just confirm that we think this is the best way to do things. If we wanted to, it wouldn't be such a diversion from the plan to permit a handful of related properties at the start, rather than just P180.

Here are some cases that I just slightly wondered about:

Use of rank to indicate the principal item depicted in the image?[edit]

Use of "preferred rank" tends to be rather rare on Wikidata. For one thing, it's not currently possible to specify it using Magnus's Quick Statements service, which is by far the dominant process that users use to edit large volumes of items. Secondly, use of "preferred rank" effectively makes all other statements disappear to somebody writing a WDQS query using only the simple wdt: form of properties. Arguably that might not be such an issue here, given that the information in the qualifiers is potentially so rich, that it's possible that most query-writers would be anyway using the p:P180 form in their queries, to be able to access it. Nevertheless, for whatever reason, it's notable that currently only 81 out of 240,000 uses of depicts (P180) on Wikidata at the moment have rank = preferred: it's not (currently) a common idiom.

We should also consider the relationship with main subject (P921). Currently 13,250 items that have a P180 statement also have a P921 statement (query: The talk page d:Property talk:P180 is not decisive about when one rather than the other should be used, but it is at least arguable that P921 is the more natural choice for the principal subject of an image, to clearly distinguish that main topic from particular items that the image may happen to include depictions of. So perhaps P921 should be available from a drop-down menu to file uploaders, as well as P180 ? Jheald (talk) 18:26, 10 August 2018 (UTC)

Scans etc intended to be a representation of a work, and nothing else?[edit]

Consider four Commons images based on the Mona Lisa.

Are there differences in how we should represent the relationship between the Commons image and the original painting? In which cases is depicts (P180) appropriate?

Should one try to distinguish cases where the Commons image is just intended to be a presentation of a work itself as it is (such as faithful photographs and scans of paintings or prints), from cases where there is other stuff going on as well ? Are qualifiers subject has role (P2868) / object has role (P3831) useful in this connection ? Can qualifier shown with features (P1354) handle it ? Would it be useful (and/or more robust) to use a dedicated property when an image is specifically intended to be a representation of the work, and nothing else ?

Details of a work?[edit]

Mona Lisa detail eyes.jpg

How to indicate that an image represents a detail of a work, rather than the whole object ?

Is it sufficient to say <Image> depicts (P180) <Work>, perhaps with a qualifier eg subject has role (P2868) = "detail" ?
Or would a dedicated alternate main property, ie <Image> "depicts detail of" <Work>, be more appropriate ?
Or, perhaps, is a specific qualifier, eg "image shows detail" = "eyes", the way to handle this? Jheald (talk) 18:26, 10 August 2018 (UTC)

Mapping from a reference image to the Commons image[edit]

The slides are really good. They give a real taste of just what interesting possibilities this new functionality is going to open up. One of the most exciting is the chance to be able to describe an object once on Wikidata, and then directly inherit all of that information into all of the images we have for that object.

But it seems to me there is an issue with relative position within image (P2677). Even for the images in Category:Mona_Lisa, it is clear that not all are necessarily cropped in exactly the same way as the reference image File:Mona Lisa, by Leonardo da Vinci, from C2RMF retouched.jpg on Wikidata item Mona Lisa (Q12418) -- for example, File:La Joconde - Gioconda.jpg appears to include a slightly broader border. Turning to Category:Mona_Lisa_in_the_Louvre the range of variation is even wider, as most images include the frame, as well as varying amounts of wall around it as well; the images also show rotation, and often a degree of keystone distortion as well, if they have not been photographed from absolutely head-on.

Similarly, if one considers images of old maps and engravings, there may be considerable variation in how such images have been cropped -- for example, the Commons image may have been cropped to the size of the whole original piece of paper; or the platemark impression; or to the edge of text, titles, caption, etc immediately around the image; or to the outside of any decorative framing of the image; or to the 'neat line' marking the extent of cartographic detail (often the inner border of the image); or even tighter, to go as tightly as possible around the actual map features.

On the other hand, the object of a depicts (P180) statement located by a relative position within image (P2677) qualifier may be quite small -- for example, a detail of a particular animal in the background; or a line of an inscription; or the painter's signature. If the cropping is not almost identical, it is possible that naive application of the P2677 values to the Commons image may in fact entirely miss the intended part of the picture.

It seems therefore we need a syntax to make it possible to record how the reference image maps to the present Commons image.

One possibility would be to record where the four corners of the reference image should map to, in the pixel co-ordinates of the particular Commons image: i.e. to record an 8-tuple of the values xa to yd, when the following mappings apply for the corners (scaled to a unit square) of the reference image:

(0, 0) -> (xa, ya); (1, 0 -> (xb, yb); (0, 1) -> (xc, yc); (1, 1) -> (xd, yd)

Any point with scaled coordinates (p, q) in the reference image would then map to

(( 1 - p)(1 - q) xa + p (1 - q) xb + (1 - p) q xc + p q xd,
   (1 - p)(1 - q) ya + p (1 - q) yb + (1 - p) q yc + p q yd )

and using this formula the four corners of the P2677 bounding box in the reference image could be mapped to an appropriate corresponding quadrilateral in the Commons image.

The 8-tuple could be given as the value of a new "mapping" qualifier on the principal depicts (P180) statement for the Commons image, that links it to the Wikidata item that the Commons file is an image of.

Correction: the bilinear interpolation formula above doesn't map straight lines to straight lines; so one should probably actually use the projective transformation formula instead,
where h33 can be scaled to have the value 1 or zero, though numerically it can be better to leave the matrix unscaled, and apply the scaling to the new computed vector. Jheald (talk) 11:12, 10 August 2018 (UTC)

Some further comments:

(i) It would be good if a tool existed in the upload system, so that if a uploader had input that their image depicts (P180) a Wikidata item for a painting, the tool would automatically (eg via autocorrelation, or SIFT points etc) try to find the appropriate mapping of the corners of the reference image in the presented image. In difficult cases, the user might be prompted to drag and stretch an overlay of the reference image to approximately the right position, that the system could then try to auto-refine.

(ii) In principle, the transformation should be able to handle even quite extreme distortion -- for example the transformation of the anamorphic skull File:Holbein Skull.jpg in Holbein's "The Ambassadors" (Category).

(iii) Accurate mapping will depend on the reference image. If the reference image -- ie the value of P18 on the wikidata item -- is changed, then mappings may become incorrect (as may values for relative position within image (P2677) on the Wikidata item itself). Some robustness should be sought against this. For example, it might be an idea to record a qualifier "reference image" when a Commons image is linked to a Wikidata item for a painting, and to throw a constraint violation if this fails to match the P18 on the Wikidata item either then or at any subsequent point. Similarly, one might want to try to lock the P18 value if there are any P2677 statements on the Wikidata item; or require a tool to be used that would update P2677 statements if the P18 were changed.

(iv) The above has been presented for the case where the Commons image is more extensive than the reference image on Wikidata. But the opposite is also frequent: where we have a Commons image for a particular detail of the image referenced by Wikidata. The same model could still be used to record distortion in the detailed mapping; although the corner points of the reference image would now be mapped to points outside the Commons image -- in particular, many would be mapped to negative coordinates. One might therefore consider whether any such tools might need to be adapted to consider the "detail" case, and whether any additional or different statements or qualifiers might be useful. Jheald (talk) 23:41, 9 August 2018 (UTC)

Additional comment on the last point: The projective transformation matrix formula is (usually) fairly readily invertible, so if the Commons image is of a detail of the reference image, it should be straightforward to ask the user to place the Commons image in the reference image, and proceed using exactly the same tool, rather than the reference image in the Commons image, and then just invert the transformation at the end. Jheald (talk) 16:58, 10 August 2018 (UTC)

Wikidata notability[edit]

There's been a certain indecision so far as to when works should have their own item on Wikidata. The situation for paintings seems settled enough -- it would appear that essentially any painting can have its own Wikidata item. But the situation for engravings, printed maps, etc has been less clear.

The ability for depicts (P180) to reference items on Wikidata, and therefore allow centralised write-once read-many-times inheritance of information puts this question into relief -- and perhaps suggests an answer.

If Commons has more than one image of the same 2D work (eg an engraving, manuscript work, printed map, etc), then that would seem a reasonable fact to justify an item being allowed on Wikidata, so that data common to both images could be shared in a single place. (A "structural need", in the language of d:WD:N §3 ?) So eg if a plate had been reprinted, so that Commons had scans of it from more than one book, it would qualify; also if Commons had multiple scans or images of the same plate.

This is necessary, I think, because (at least with P180 initially) there appears to be no provision for Commons pages generally to be able to inherit or present information from CommonsData items for other Commons images, only from Wikidata items that the Commons image has a depicts (P180) link to.

Possibly this might change, if CommonsData got properties akin to based on (P144), or modified version of (P5059), but pointing to other CommonsData items -- in such cases one might want to inherit information directly from those CommonsData items. That might come, for particular kinds of information, in particular circumstances. But, at least initially, it would seems likely that for information to be shared between multiple Commons pages, there would need to be a Wikidata item for it to be stored on.

If a set of engravings is well-enough known, so that several had images on Commons, arguably the balance of convenience might also be to allow a Wikidata item for each of the engravings in the set, so that they could all be created as a complete group in a systematic process at the outset, rather than some items being created up front, some items being created later once Commons did have multiple images (significantly more tiresome than creating everything at once), and some without Wikidata items at all, creating a mish-mash of some works with items present and some without. Rather than that, I think there is a strong case that if a number of items are going to be created for works in a set, it makes sense to be allowed to be systematic, and for items for all of the set to be allowed to be created.

Bu it would be good, I think, to have a proper discussion on this, and develop proper guidance. Jheald (talk) 12:49, 10 August 2018 (UTC)

and need to think about prints of paintings. do they both depict the painting subject? does the engraving depict the painting? do we have a property "derivative"? Slowking4 § Sander.v.Ginkel's revenge 16:52, 11 August 2018 (UTC)

Inheriting information from multiple Wikidata items, and from Wikidata items for sets of works[edit]

I find slides 32 and 35 a little confusing.

Commons pages can inherit information from Wikidata items for a number of different pictures -- the example given using File:De kunstgalerij van Jan Gildemeester Jansz Rijksmuseum SK-A-4100.jpeg is a good one, although in that would inherit directly from The Art Gallery of Jan Gildemeester Jansz (Q17337965). But I think it's a bit confusing to show an icon of the Mona Lisa inheriting from three icons of the Mona Lisa. An image of the Mona Lisa would surely only inherit from one Wikidata item for the Mona Lisa. So I think this slide would be improved by thinking what are typical cases where a CommonsData image item might reference multiple Wikidata picture items to draw from. One might be a photograph of an art gallery, including multiple pictures on the wall or an arrangement of works together. Are there others? It might be useful to think about that. But the slide could depict this scenario more intuitively, by more clearly depicting on the Commons side an icon for an image of group of works, and on the Wikidata side different icons for each of the works (eg perhaps a portrait icon, a landscape icon, a left-right inversion of the landscape icon), to show how the Commons image uses P180 to link to each of these.

Or have I misunderstood the meaning of this slide?

Image series[edit]

But the question of image series is interesting, particularly in the context of prints and engravings, where (over time) an engraving may have been issued in multiple states, often distinguishable from each other only by very very subtle differences. Where should information be located in this sort of case? For paintings, it's easy enough to assert that every painting (almost) passes Wikidata:Notability. But given that that's not so clear for engravings, it is even less clear that it is the case for different states of an engraving. Usually the great majority of what different states depict will be identical to other states (and in the same place), so it would make sense for it to be described and located on the master item for all the states of the engraving.

If we have Wikidata items for individual states of the engraving, it probably makes sense for them to be stated to be edition or translation of (P629) the master item (or some very similar property). If such a relationship is present, should the Commons page read across the P629 link to read all of the depicts (P180) information on the master item ?

In practice, we probably won't have Wikidata items for many individual states of an engraving, because metadata we have access to may simply not diagnose which state a particular print is an example of. So in that case one could only say relate the Commons image to the Wikidata engraving master item. In such cases, do we need to be able to say that a Commons image does not depict a particular item, if it is generally present in the engraving, but in this particular state, that we have an image of, it has been erased or replaced? Even if information on the different states of an engraving is available out there, I would imagine that Wikidata editors (with a few exceptions) will be quite slow to create them, preferring to prioritise covering more engravings with basic items, rather than putting in too much time creating items to be able to record differences between states that may be almost invisible. So for engravings, unlike for paintings, I would not in general expect a different item for each different version, nor counsel people to go out of their way to create such an item. Jheald (talk) 15:26, 10 August 2018 (UTC)

there is scholarship about print states in catalogs. and this is different from "yellow milkmaid" problem. could have separate item for each state, if in catalog, and same item / category if not. i.e. let the citations drive the data. Slowking4 § Sander.v.Ginkel's revenge 16:55, 11 August 2018 (UTC)


Why is this in a PDF? Seems very un-wiki. - Jmabel ! talk 20:43, 10 August 2018 (UTC)

see also Commons:Project scope#PDF and DjVu formats. Slowking4 § Sander.v.Ginkel's revenge 16:58, 11 August 2018 (UTC)
@Jmabel: it is in PDF because free formats like .odp are disabled on Commons. Keegan (WMF) (talk) 17:23, 13 August 2018 (UTC)
But why isn't it implemented as (for example) a series of separate SVNs in a gallery, with the text parts available as wikitext? - Jmabel ! talk 20:49, 13 August 2018 (UTC)
This presentation had to be created in a way that could be shared to be re-presented with ease both within the Wikimedia Foundation and outside with the Commons community, GLAM partners, affiliates, and anyone interested in SDC in general. A slide presentation is the most efficient and effective way to do this and provide reuse to others who may want to present the material as well. In short, a judgement call as Commons doesn't allow .odp and the .pdf seems to me to fulfill its unfortunate encumbered job here. As you've seen the previous consultations, you'll know that wikitext and visuals are my preferred method of getting things in front of the community - you'll see them again with the next one in a few weeks. This case is an exception. Keegan (WMF) (talk) 18:24, 14 August 2018 (UTC)

PDF is fine (for an intermediate document meant to elicit discussion) and certainly better than an external Google Docs. However, feel free to ask enabling of ODP uploads. There was consensus a while ago already anyway, as well as clearance from WMF security. --Nemo 11:40, 16 August 2018 (UTC)

Community processes[edit]

Slides 26 to 28, presenting possible community processes, are interesting.

In respect of "proposing a new Wikidata property" (26), I wouldn't expect anything much different from the d:Wikidata:Property proposal/Creative work board, albeit possibly with a section specific to CommonsData items. It's not particularly user-friendly, but it seems to mostly get the job done. From the slides I'm not 100% clear whether the team are proposing anything much different to this?

In respect of "proposing that a property's rules be changed to accommodate a Commons use case" (28), the advice to open a thread on the talk page is basically sound, though it may well be that nobody would see it or respond to it there. The user should probably be advised to also flag the suggestion at d:Wikidata:Project Chat, and (presumably) a Commons-based forum similar to Wikidata Project Chat, but specifically for discussing CommonsData, with a link to the detailed discussion they were opening on the talk page.

As for "Proposing that a property be added to the depicts 'allowed qualifier constraint' (27), this seems odd. Firstly, for a broad discussion like this, the right venue to open discussion would probably be the CommonsData equivalent of Project Chat. (The "designated forum where discussions of the constraints happen" probably does not exist).

Secondly, it's interesting that this is flagged up as such a big issue at all. On Wikidata, the prevailing behaviour is probably that if a property that can be used as a qualifier seems like it would be a useful qualifier to use on a particular property, then people just go ahead and use it. If doing so starts producing constraint warnings that the qualifier is not included in the list of allowed qualifiers, then the Wikidata user might just ignore the warnings, or, if the qualifier seems patently useful, may simply just go ahead and add it, on a Bold-Revert-Discuss (en:WP:BRD) basis) -- if anyone questions it, discussion can proceed on the talk page; but otherwise, if the edit sticks, then so be it.

As a result, it's perhaps worth noting that there is already rather a longer list of qualifiers (query: in use on depicts (P180) than those listed against the allowed qualifiers constraint (Q21510851). Some of these look a bit curious -- it might be worth looking further into some specific examples of their use, to see actually how they are being used, and whether that really makes sense -- but some seem very understandable, eg point in time (P585) to indicate that something is being depicted as it was at a particular point in time.

The suggestion that it would be for a "community ambassador" from Commons to make the change (or to ask for the change to be made on Wikidata) seems rather odd, when the prevailing model has been that anyone can make such a change at will, and then (if necessary) thrash out the rights and wrongs of it on the talk page. I suppose what the team has in mind may be similar to edit-protecting the d:Property:P180, so that only a Wikidata admin could edit it (who might perhaps also be a Commons admin, and thus de facto the Commons "community ambassador").

Whilst such page-locking might (conceivably) become necessary from time to time, it's interesting that the team seem to think that it should be the case by default. At the moment I don't think anyone would particularly think of protecting a property page for that sort of reason on Wikidata, since all the "allowed qualifier" list affects is whether a discreet warning gets shown if particular qualifiers are added, warnings which many Wikidata editors seem fairly immune to anyway. Does the team indeed think that page-locking d:Property:P180 is indeed something that will be necessary, given the high profile of P180 as the new leading property on CommonsData; or would it be enough just to recommend, for users seeking to add to the allowed list, just for them to raise it on the talk page, flag the discussion at CommonsData Project Chat, and then make the change if nobody objects within a couple of days? Jheald (talk) 20:52, 10 August 2018 (UTC)

Update: extended query for qualifiers used with depicts (P180):, giving an example values and qualifier-values to give a context of each use. Jheald (talk) 11:20, 11 August 2018 (UTC)
Suggestions, mentions or discussions around community process are brought up because they are things that have been suggested to us by other community members. If things are flagged as big issues here it's not necessarily that they are big issues from us, but they are from your fellow community members. For example, I read that it's your opinion that a "community ambassador"-type role strikes you as odd when these are all wikis and anyone can edit, but it's important to remember that not everyone feels comfortable editing everywhere. Sometimes people prefer one environment, and working with a new one (Wikidata) is a negative to them and they would rather someone else step up and do the work. That's a reasonable position to have as well as your is.
The ultimate point is that all of the community process is owned by the community. If there's no need for an ambassador and people are willing to participate across all the wikis in making decisions, great! If not, hopefully there is a process in place to account for all voices that wish to participate in a forum that works for them. And again, these ideas and suggestions are from and for the community to decide. Keegan (WMF) (talk) 18:20, 14 August 2018 (UTC)

Commons talk:Structured data[edit]

There was a message at the village pump but not on the main talk page, if I see correctly. If people are expected to watch N subpages in addition to the basepage talk, that might be worth pointing out at the top of that talk page. --Nemo 11:38, 16 August 2018 (UTC)

Good point, Nemo. I put a note at the bottom of the talk page and added to the header at the top of the talk page. Thank you for the suggestion. Keegan (WMF) (talk) 18:16, 16 August 2018 (UTC)

Distinguishing genuine documentary depiction from fake[edit]

Consider these two images. Besides the hand-tinting, which I presume we will cover some way other than "depicts," note that the ad on the side of the Corona Building has been altered to advertise "Joseph Goldie, distributor for Pommery Champagne". Will we have a way to distinguish that from honest documentary depiction? - Jmabel ! talk 17:31, 8 September 2018 (UTC)

Inadequacy of "wears" P3828[edit]

I wanted to bring up something we ran into at Wikidata Lab in Brazil – the inadequacy of wears (P3828) for depicting what a person might "wear." In American English, you can say someone wears a dress, a shirt, a watch or a hat. But in many languages, this particular term would be used for clothes only, and not accessories or other things. Joalpe pointed this out for Portuguese where they would employ the word "use" instead. Similarly, in Chinese they would use the word "chuan" for clothes and "dai" for hat and watch.

So we have a problem here – this does not map cleanly. Instead, I've been experimenting with person -> shown with features (P1354) -> dress or watch.

I'm not sure how to solve this long term, but this is a pretty big deal we should discuss. -- Fuzheado (talk) 13:34, 15 October 2018 (UTC)

Properties should represent a concept, not the meaning of a word in a particular language. Regards, Yann (talk) 15:39, 15 October 2018 (UTC)
Agree, but right now there is a disconnect in how this is being used in labels and descriptions. Perhaps changing this to "has on their body" is the solution, but that's not what the property proposal converged on. In fact, the discussion was brief and among a small circle, so calling folks back in from that original discussion may be useful. @Pharos, Thierry Caro, ChristianKl, Pigsonthewing, PKM: -- Fuzheado (talk) 16:25, 15 October 2018 (UTC)
Clearly something needs to change, but we need advice from speakers of the affected languages, to see if there are umbrella terms (words or phrases) that can be used. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:41, 15 October 2018 (UTC)
In French, porter can be used for a dress, a shirt, a watch or a hat. Regards, Yann (talk) 17:13, 15 October 2018 (UTC)