Commons talk:Structured data/Modeling/Author

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

These earlier notes and resources may be inspiring:

Multichill (talk) 16:44, 7 September 2019 (UTC)

Authorship notes from Wikimania 2019[edit]

Copy here for future reference

Author (P50) can be used if the author has a Wikidata item If the author does not have a Wikidata item, there are a couple other possibilities to identify the author:

  • Author name string (P2093) can be used if the author doesn't have a Wikidata item
  • Wikimedia username (P4174) can be used if the author has a Wikimedia username --> COMMENT: May be best indicated using "author" = "somevalue" with qualifier "Wikimedia username" = Username .
  • Conventions need to be established on how to designate anonymous and unknown authors
  • Should we use the Anonymous (Q4233718) and/or Unknown (Q24238356) items? --> COMMENT: There is quite developed practise on Wikidata now for paintings of unknown/anonymous/pseudonymous authorship
  • Should there be a way to set unknown value from the Add statement interface? --> COMMENT: "Unknown" in the Wikidata / SDC UI should be renamed "somevalue", as per the underlying software, because this is what the value actually means.

Author properties

  • If the author has an item, most relavent data about the author should be pulled automatically from the author item
  • If the author doesn't have an item, we should create qualifiers under the Author name string or Wikimedia username value, e.g. Date of death (P570), Official website (P856), Flickr user ID (P3267), etc. --> COMMENT: For consistency, use "somevalue" with qualifer "stated as" rather than "author name string" ?

Numerous Wikidata properties are available for author IDs in various databases, but we should probably pull this informtation automatically from the Author item in most cases. The role of each author could be specified as a qualifier to the Author or Author name string value using the Subject has role (P2868) property. For example, photographer, painter, architect, scultpor, etc. A new property is probably needed for Author attribution (which accepts a string). This will likely go under the licensing data, however, rather than the authorship data. There are only 13 attribution templates per

  • Creator templates : Probably should all be managed through Wikidata Author items, but we need a way to check if all the data from the templates is on Wikidata. Eventually, these will be replaced entirely
    --> COMMENT: Conversely, display extended Creator style information in field on file description page, if we have an author statement with a Q-item Ultimately this approach could provide all the visible functionality of a creator template, without needing any creator templates

Uploader to be treated separately from authorship.

End of copy Multichill (talk) 16:47, 7 September 2019 (UTC)

Getting authorship started[edit]


Author is one of the core fields in our current templates. This was discussed quite extensively during Wikimania (notes in previous section). With a file on Commons, multiple people might be involved with multiple roles: The painter of the original work, the photographer, the person who uploaded it, etc. We discussed two possible approaches:

  • Have a property for each role
  • Use one generic property and a bunch of qualifiers

Consensus seemed to be towards the second approach. Some remarks:

As mentioned above, we'll have a lot of cases where the person doesn't have a Wikidata item. We don't have to create items for all these persons, in fact I'm very much against that. We can just use "somevalue". If you look at this edit and what messages are used, you'll notice that MediaWiki:wikibase-snakview-snaktypeselector-somevalue is used. We should probably update that from "unknown value " to something like "some value without a Wikidata item" or "no Wikidata item available".

So we get the possible qualifiers when creator (P170) is set to "somevalue":

  • object has role (P3831) when we have multiple persons in different roles. We probably need to make a list of common roles and how to use them.
  • author name string (P2093) to put in the name of the author. For users here that can be the username or the real name
  • Wikimedia username (P4174) to link it to the username here
  • URL (P2699) (or new property "Author url") to link to the author in a consistent way. That can be a link to a userpage here or some external website like Flickr. We could also just use
  • Other person identifiers, like official website (P856) and Flickr user ID (P3267). Imho these shouldn't replace the new "author url" property because that offers a consistent way to construct links for re-use
  • Any other properties that would be acceptable on persons on Wikidata. This is probably more rare.

So as an example, one of my painting uploads:

What do you think? Multichill (talk) 17:38, 7 September 2019 (UTC)

I am in favor of more generic property also, but what I see here - 4 qualifiers. People wont be adding this by hand. So the question is weather we can create it by bot, than I am in favor of that, if not, lets leave it simple as much as possible, and use "author" and "author (text)" only.

I dont know, how you would indicate roles of those, who does not have an item on Wikidata, such as the person who digitalize the work, or the person who made a crop, which is in some cases indicated in the author field of the description template. Juandev (talk) 15:13, 13 September 2019 (UTC)
See the example where I am the author and I don't have a Wikidata item. Multichill (talk) 09:26, 14 September 2019 (UTC)
Well object has role (P3831) has an item data type. So if somebody just made a crop, how exactly you would indicate it? Juandev (talk) 06:30, 27 September 2019 (UTC)
Find a suitable role on Wikidata or create one? Probably something like editor (Q1607826). Multichill (talk) 17:58, 1 October 2019 (UTC)

Alternative modelling[edit]

I think this method has to many qualifiers, here is what I would prefer:
creator (P170) if the creator is notable for Wikidata and has an item
author name string (P2093) if creator is not notable for Wikidata
and a new property "Creator (Wikimedia-user)" with a Wikimedia-user-datatype if the creator has an Wikimedia-account and is not notable

As addition there should be a "Uploading user" property with Wikimedia-user-datatype.

Changes on files should have the same attributes with qualifiers. --GPSLeo (talk) 16:05, 13 September 2019 (UTC)

That only covers a small subset of cases. Take for example Flickr, that wouldn't be covered by this.
I agree that quite a few qualifiers are possible. From a data model point of view that's no problem, from an interface point of view some time should be spend on making it easier and convenient to enter this. Multichill (talk) 08:59, 14 September 2019 (UTC)
I think these things could be dose as source. The way I would prefer would be to create a new Wikibase instance with items for every creator with an file or file change on commons. There we could link the Useraccount here, a flicker account or just the Wikidata-Item of the author. --GPSLeo (talk) 09:15, 14 September 2019 (UTC)
I was afraid you're going to say that. That's a firm no. We're not going to create an item on Wikidata for every person who every uploaded a file here or on Flickr. Those items are very much out of scope on Wikidata. Has been discussed before. Trying to go down that road is just a waste of effort and you'll hit a dead end. Multichill (talk) 09:24, 14 September 2019 (UTC)
Well, if fillimg all the lines would be optional, why not. Those whou would like to ad as much as informaion can and thos whoud like to shorten it could shorten it.Juandev (talk) 05:53, 17 September 2019 (UTC)
I do not want Wikidata-Items for every Commons creator. I want a separate database just for this, like the user-namespace. But not only for creators with a Wikimedia account. --GPSLeo (talk) 16:52, 22 September 2019 (UTC)
And why we need a database for commons contributors?Juandev (talk) 06:41, 27 September 2019 (UTC)
That me have one place to link Commons-user account, Flickr, website, name, e-mail-address and all important information that is in templates now and would need to have that at every image as separate statements. With this way only one statement is needed to link all that information. --GPSLeo (talk) 08:06, 27 September 2019 (UTC)

Adding some authors[edit]

This page is not really getting a lot of useful feedback. Other people might be busy with something else or just haven't noticed. I'm adding some authorship information to some images now. This might encourage more people to give feedback. Multichill (talk) 18:01, 1 October 2019 (UTC)

Be careful. The API seems to work like on Wikidata with full functionality but the GUI does not. The property does not get displayed if containing none item values. --GPSLeo (talk) 18:25, 1 October 2019 (UTC)

Datatype for Commons photographers[edit]

Moved from Commons talk:Structured data

A question that keeps coming up on Wikidata is: which property Commons should use for photographers (e.g. some string property). I had thought that this was being addressed by the project with a new datatype, but it seems that it is still open.

A possible approach that might not have been checked yet, could be to create a datatype that links to Commons user pages.

You might know that, in addition to the datatypes for files (d:Help:Data_type#Commons_media), Wikibase/Wikidata has a few datatypes for other Commons namespaces. These are:

For Commons user namespace, a new one could be defined fairly easily, as it would probably re-use the code of the above. Jura1 (talk) 12:56, 13 September 2019 (UTC)

Look at #Let's do some modeling!. I think we should discuss it altogether on one place. At least we, who identify ourselves as Comonists. Juandev (talk) 15:21, 13 September 2019 (UTC)
  • I think there was already some prior discussion/proposal(s) not referenced there. Jura1 (talk) 15:40, 13 September 2019 (UTC)

End of move. Multichill (talk) 08:50, 14 September 2019 (UTC)

We sure had previous discussions about this. At least a couple of years ago at the first kick off of structured data. Now you're probably looking for phab:T127929. It's quite a long discussion and it doesn't look like we figured out how the data type should work exactly. So here we just picked it apart in different qualifiers to see if we can get that to work. Maybe later on based on our experience the new data type can be implemented and replace the current structure. Waiting for this new data type will just get us stuck on implementing authorship. Multichill (talk) 09:06, 14 September 2019 (UTC)
I removed the reference to phab:T127929 as this proposal is somewhat different and likely simple to implement. Jura1 (talk) 10:28, 14 September 2019 (UTC)
  • User pages and namespaces are probably to be discarded as the use of user namespace can vary a lot (no user pages, various redirects, ect...), something with the account name maybe with a compulsory qualifier "stated as" the user name used/wanted. Christian Ferrer (talk) 13:12, 14 September 2019 (UTC)
  • This is a tricky one. Originally, there was talk about implementing Wikibase "virtual statements" for things like user accounts, but that never took off. Personally, I'm in favor of something similar to how Wikidata handles P856 ("official website"). It uses regexes to look at the domains in the entered URL, and automatically shifts the input to an appropriate property if necessary (ex: values get automatically converted to Twitter Username [P2002] statements, values get turned into Instagram username [P2003] statements, etc.). Since Commons has media from Flickr, YouTube, and Commons itself, finding a way to easily support all of that in one place seems prudent. It's ultimately up to you guys though :) RIsler (WMF) (talk) 19:00, 23 September 2019 (UTC)
    • @RIsler (WMF): Something like this? What would be the appropriate ID for Commons users? Jura1 (talk) 09:45, 26 September 2019 (UTC)
That could work. I see two possible scenarios here: 1.) We just have one field for Author/Photographer URL and it's just a simple URL with no fancy formatting. 2.) We allow for multiple URLs/usernames to account for cases where someone could have a Flickr page, AND an official website, AND whatever else. This would probably work more like the conditional property setting features of P856 (official website), but maybe with those values as qualifiers of a top level value. Option 1 is certainly easiest to implement. Option 2 may give the most flexibility for the data but involves a lot more effort. RIsler (WMF) (talk) 19:43, 26 September 2019 (UTC)
  • @RIsler (WMF): to me, option (2) looks much like having items. A simple approach could be to create these at Wikidata (what some don't want) .. an alternative could be entities in creator or some other namespace. I guess I will check back in a year or so to see what was finally chosen. Jura1 (talk) 12:51, 29 September 2019 (UTC)

Pure text[edit]

I am sorry, I am not so familiar with Wikidata. can I normally query pure text values and use them as conditions as I can do for an item like values? Juandev (talk) 05:56, 17 September 2019 (UTC)