Commons talk:Structured data/Modeling/Location

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

How to transfer information from template to structured data[edit]

There are more then 13. million files in Category:Media with locations we need to think about how to transfer the information now given in the Wikitext templates into structured data. If we have a bot with the standard limit of one edit per 5 seconds it would need 785 days, with one edit per second it would need 157 days. I think we should ask about a way to run a scrip doing this direct on the server or a similar way. --GPSLeo (talk) 07:42, 11 October 2019 (UTC)[reply]

I wouldn't worry too much about this. It's a big task, but we managed to do it on Wikidata too. Multiple bots can work in parallel as long as the database can keep up. Multichill (talk) 10:34, 10 November 2019 (UTC)[reply]
  • To view the files on query server maps, that would most helpful.
Maybe one should prioritize files that already have P180 .. Jura1 (talk) 06:04, 13 December 2019 (UTC)[reply]

Types of locations[edit]

I think a basic distinction (enforced by separate prompts in the user interface) should be:

  • Depicted place (depicted location, location depicted) - the place represented in the painting or photograph
  • Location of object - where the painting or photograph is physically located (location of museum, archive, etc.)

- PKM (talk) 21:41, 29 October 2019 (UTC)[reply]

@PKM: : We actually document (at least) three things here on Commons:
  1. Location depicted
  2. Location of the point of view
  3. Location of the object
Currently that's in a nice category and template mix. Let's break it down:
What do you think of this approach? Did I miss different types of locations? Multichill (talk) 11:51, 10 November 2019 (UTC)[reply]
@Multichill: As far as I have been thinking, you are capturing it all in your nice summary. I also agree on the need for a new property for heading. I believe this approach will be both quite useful and easy enough to grasp. Ainali (talk) 12:12, 10 November 2019 (UTC)[reply]
@Multichill: Sounds good to me. - PKM (talk) 19:33, 10 November 2019 (UTC)[reply]
When we add a property for heading, we should also include a property for the height, is that elevation above sea level (P2044) ?, for example for drone photograpy.--Hannolans (talk) 00:49, 18 December 2019 (UTC)[reply]
@Ainali and PKM: picking up where we left off. Coordinates seem to be doing just fine. We have a clear one on one relationship:
Jarek is tweaking the tracking categories. This looks like a pretty clean data migration.
Location on the other hand is a bit more complicated. I noticed someone has been mass importing location (P276) statements (42,001 at the moment). The location of the point of view (P7108) still has much lower usage (1,766).
Maybe we should just only use location of creation (P1071) and location of the point of view (P7108) for photographs?
That would be in line with how we model artworks. Here we also use location of creation (P1071) to indicate where it was made and location (P276) to record where the the work is now (and has been). This overlaps with location of the point of view (P7108), but that property feels better in context when you're actually looking from somewhere like for example on File:Wien, Stephansdom, Blick vom Südturm -- 2018 -- 3274-6.jpg.
To take another example on File:Sydney (AU), Opera House -- 2019 -- 2280.jpg it would just be location of creation (P1071) -> Sydney. Multichill (talk) 17:17, 16 February 2020 (UTC)[reply]
I just noticed File:De Moulin Rouge in Parijs bij avond, Bestanddeelnr 254-5695.jpg becoming a featured picture, good one to test:
Just not 100% sure about the qualifier to use, that's this conversation. Multichill (talk) 17:53, 16 February 2020 (UTC)[reply]

heading Property[edit]

@Multichill, Hannolans, Jura1, PKM, and Ainali: I created proposal for heading property at d:Wikidata:Property_proposal/Commons#Viewpoint_heading, to be companion to coordinates of the point of view (P1259). Please comment, vote or correct the proposal if needed. --Jarekt (talk) 17:26, 20 December 2019 (UTC)[reply]

@Jarekt: Thanks! Updated it a bit. Multichill (talk) 17:37, 20 December 2019 (UTC)[reply]
@Multichill, Hannolans, Jura1, PKM, and Ainali: the infrastructure part of heading is working now, see this example. The user interface part is working and in phab:T227116 it will get even better. Multichill (talk) 16:33, 16 February 2020 (UTC)[reply]

Scraping geocoordinates[edit]

I was testing some SQL queries for scraping geolocations from files and showing them as QuickStatements. My best version can be found in here. It is a good start but I still need to get one per file and perhaps grab camera heading as well. --Jarekt (talk) 20:58, 20 December 2019 (UTC)[reply]

@Jarekt: currently I'm just parsing the templates like in this example. At least for the first pass this seems to work alright. My focus is to do this in bulk. So update a lot of data in a lot of files, one edit per file. The category Category:Pages with local coordinates and matching SDC coordinates actually has more files, but looks like an edit using wbeditentity doesn't trigger a page update. Created phab:T245349 for that. Multichill (talk) 16:39, 16 February 2020 (UTC)[reply]
One edit per file is great. I was thinking about doing a lot of smaller edits with QuickStatements, and work with one property at a time, but QS does not seem to work reliably on SDC. Are you the only person adding SDC or are there more? Is there a way to track which other bots (or people) add SDC? --Jarekt (talk) 20:53, 16 February 2020 (UTC)[reply]
@Jarekt: My focus at the moment is on Wiki Loves Monuments and Geograph so if you filter these out and sort by last edit, you get the other edits. Multichill (talk) 17:30, 18 February 2020 (UTC)[reply]

Privacy and geo information[edit]

I'am wondering if there a GDPR assessment made and published somewhere in relation to location data? Converting exif information of files donated by living people in a linked open database needs a privacy check like face recognition as with that information we can exactly trace when a person was were. --Hannolans (talk) 00:16, 22 December 2019 (UTC)[reply]

There is no change in the usage of location data. There is just a new Data Model and a new GUI. There are exceptions for taking pictures at public places. And if the place is not public there already where permissions needed. The location data with coordinates changes nothing there because a picture, if it is not an object, nearly always shows the place the photo was taken. --GPSLeo (talk) 20:53, 10 January 2020 (UTC)[reply]
Well, we are extracting exif-information and make it available as linked open data. Extracting information is data processing and falls under the General Data Protection Regulation. --Hannolans (talk) 16:39, 7 April 2020 (UTC)[reply]

Draft mockups for geo-coordinates[edit]

The team is making progress wrapping up geo-coordinates support, and would like your feedback (if you have any) on the proposed designs. Please check out the Phabricator task for mockups, rationale, and other input. Thanks! Keegan (WMF) (talk) 20:02, 10 January 2020 (UTC)[reply]

I regularly use location of the point of view (P7108) for my photographs to name the street/square/other building/mountain etc. it was taken from. So far I never used location of creation (P1071). Isn't this property redundant for photographs? A photograph is always created where the photographer is standing. Maybe location of creation (P1071) is actually the suitable property but then the other one would be dispensable.--Leit (talk) 09:23, 7 April 2020 (UTC)[reply]

I agree. location of creation (P1071) is in my understanding relevant to the depicted object, like a sculpture. Raymond 09:43, 7 April 2020 (UTC)[reply]
In my opinion it is a difference, the location of creation (P1071) is more a location, a address, where the photograph was taken, the location of the point of view (P7108) can be a location, but also a hill, a tower, an aircraft or a mobile stage. --XRay talk 09:53, 7 April 2020 (UTC)[reply]
See Commons:Structured data/Modeling/Location too, location of creation (P1071) "where the photo was made" and location of the point of view (P7108) "where the photographer is standing". --XRay talk 09:57, 7 April 2020 (UTC)[reply]
If I take one of your recently uploaded images, File:Münster, Prinzipalmarkt 22 -- 2017 -- 9774.jpg, you stated for location of creation (P1071) the square the photo was taken from just as I would have done only using the other property. A hill or a tower also have an address that is stated in Wikidata. I guess a mobile stage (e.g. a drone photography) never has an address.--Leit (talk) 10:12, 7 April 2020 (UTC)[reply]
"where the photo was made" is in my view the same as "where the photographer is standing". Even if it is an aerial shot.--Leit (talk) 10:15, 7 April 2020 (UTC)[reply]
May be there was already a discussion. I've found this hint: "Add the Wikidata item for the most specific location. It an be a general location like Haarlem (Q9920) or specific Kerkstraat (Q17286558). Just use the most specific location you're sure about and for which an item exists. Standort des Beobachters (P7108) is more specific, but doesn't replace Herstellungsort (P1071)" (Source: Commons:Wiki Loves Monuments/Structured data) --XRay talk 11:42, 7 April 2020 (UTC)[reply]
@Multichill and GPSLeo: FYI: May be you're interested because you are running bots adding structured data. --XRay talk 11:56, 7 April 2020 (UTC)[reply]
So one way to use location of creation (P1071) would be in case you don't know the specific location – meaning, for my own files I would rarely use it because I usually know the specific location. Besides this, the given example Kerkstraat is already the most specific location. So then location of the point of view (P7108) can be more specific, but is not always and is possibly even more often not. On the other hand sometimes there are only very general locations like North Sea (Q1693) in case of a photo taken by a boat where this general location is already the most specific possible. So the confusion seems to be only growing by the explanation for WLM participants. Whether something would truly be lost if both properties were merged (not on Wikidata, but for use on SDC) can probably be doubted. Location of creation could even be kept as main label or synonym just as coordinates of the point of view is the same as coordinates of the creation. I know there is no way to standardize the property use but I'm just thinking of what Commons recommendations actually make sense.--Leit (talk) 13:46, 7 April 2020 (UTC)[reply]
Just FYI: Search for "file: haswbstatement:P1071" 534.798 files, "file: haswbstatement:P7108" 2.365 files --XRay talk 13:06, 8 April 2020 (UTC)[reply]
@Leit, Raymond, and XRay: Thanks for the ping, catching up on those. The property location of creation (P1071) can be used on for indicating the location where the photo was taken (not the depicted location) and it ranges from a large region like Netherlands (Q55) to a (very) specific point like Wilhelminatoren (Q3215092). location of the point of view (P7108) should probably only be used if the point is well known and not a region. Would be weird to have Netherlands (Q55) in location of the point of view (P7108). So I think every photo should have location of creation (P1071) and some photos can have location of the point of view (P7108) too. So yes these two overlap. Multichill (talk) 16:29, 14 April 2020 (UTC)[reply]

Just to add to this, just recently recording location (P8546) was approved, but it is mostly useful for audio or video where (the now sub properties) filming location (P915) and recorded at studio or venue (P483) are too specific. Ainali (talk) 11:27, 22 August 2020 (UTC)[reply]

Aerial photographs[edit]

If I use location of the point of view (P7108) what is the suitable statement for aerial photographs? I doubt that it is useful to name the specific aircraft (paraglider, hot air balloon etc.) the photo was taken from. In most cases the file description only states aerial photo/aerial shot. Of course, Wikidata has the item aerial shot in cimematography (Q4688031). Shouldn't a Wikidata property example for media (P6685) be added there so that it is recognizable as SDC item?--Leit (talk) 09:55, 7 April 2020 (UTC)[reply]

I used genre (P136) -> aerial photography (Q191839) in the past. aerial shot in cimematography (Q4688031) is new to me and I do not see the difference to aerial photography (Q191839). Maybe my fault? location of the point of view (P7108) (in case of an image taken by a drone or from a plane) was not set by me because I have no idea about a suitable value. Raymond 11:15, 7 April 2020 (UTC)[reply]
aerial photography (Q191839) means the whole profession of taking images from the air while aerial shot in cimematography (Q4688031) means the specific photo. I hadn't known genre (P136) on SDC before. airplane (Q197) or unmanned aerial vehicle (Q484000) would probably be a location of the point of view (P7108) and aircraft (Q11436) in case the specific vehicle is unknown. I hadn't used this before as well but maybe that could be an option. --Leit (talk) 11:34, 7 April 2020 (UTC)[reply]
I'm simply using instance of (P31) but genre (P136) sounds good too. --XRay talk 11:50, 7 April 2020 (UTC)[reply]
It looks like different ways with the same goal. We should accept different ways and respects the way others add statements. (Exception: There is really something wrong.) --XRay talk 11:51, 7 April 2020 (UTC)[reply]
I agree. Raymond 11:56, 7 April 2020 (UTC)[reply]
Yes of course. There would then only be conflicts when several users edit the same files of other uploaders who do not yet use SDC systematically. I was surprised to see that every Wikidata property can actually be used here (I assumed that only a limited number would be authorized).--Leit (talk) 12:55, 7 April 2020 (UTC)[reply]
You've forgotten the qualifiers. ;-) --XRay talk 12:58, 7 April 2020 (UTC)[reply]

Events?[edit]

Hello,

We have many files taken at events − for example:

Instinctively, this seems to be a fit for location (“where was it taken? at the Woodstock festival”), hence me raising this here.

But it’s actually more complicated: sometimes the event implies the location − for example 2018 FIFA World Cup opening ceremony (Q54968737) took place in Luzhniki Stadium (Q202163). But events like festivals, art biennials, demonstrations, may span various locations (like the Gay pride example).

How do we want to model events? (happy to move the question to another subpage if there is a more relevant place)

Jean-Fred (talk) 15:41, 27 August 2020 (UTC)[reply]

I'm using significant event (P793) for events. --XRay talk 15:46, 27 August 2020 (UTC)[reply]
Hmmm, I would have associated significant event (P793) with events associated with the file itself − such as cropping (Q785116) or upload (Q7126699) or image stitching (Q1364242) (some of these fit rather in fabrication method (P2079) but you get the idea). Jean-Fred (talk) 15:54, 27 August 2020 (UTC)[reply]
Well, with 134,399 usage of significant event (P793), this seems to be the de-facto standard. Jean-Fred (talk) 16:01, 27 August 2020 (UTC)[reply]
@Jean-Frédéric Wait. A count is not a proof that it is used as with events. It could be 130,000 crops for all we know without further analysis. Ainali (talk) 19:41, 15 February 2022 (UTC)[reply]
@Ainali: The SPARQL query that lists all events is right under this :-þ (although it times out for me now) I had audited it back then, and I re-audited it just now (by removing the labels) all the uses above 10K pictures − they are all actual events (sports or space exploration it seems). Jean-Fred (talk) 20:48, 15 February 2022 (UTC)[reply]
SELECT ?significant_event ?significant_eventLabel (COUNT(?item) as ?count) (SAMPLE(?item) as ?sample) WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?item wdt:P793 ?significant_event.
} GROUP BY ?significant_event ?significant_eventLabel

Try it!

If there is consensus for this, we should document the practice at Commons:Structured data/Properties table. Jean-Fred (talk) 16:03, 27 August 2020 (UTC)[reply]
I was thinking about a property somewhere between location of creation (P1071) and significant event (P793), but couldn't find one.
Like you point out, significant event (P793) feels a bit the other way around. I don't have a clear alternative and it's already used a lot so let's go for significant event (P793) and see how that works out. We can always switch it to something else if it turns out not the best way to do it. Multichill (talk) 19:23, 27 August 2020 (UTC)[reply]
If we are missing a property for the job than maybe we should create a new one, maybe "created during" or "created at". --Jarekt (talk) 01:51, 29 August 2020 (UTC)[reply]
Yes, I really like this suggestion! Ainali (talk) 19:42, 15 February 2022 (UTC)[reply]
Sounds like a plan. Jean-Fred (talk) 20:48, 15 February 2022 (UTC)[reply]
I'll create a property proposal today and link it here when it's live. Ainali (talk) 09:47, 19 February 2022 (UTC)[reply]
@Jarekt, Jean-Frédéric, Multichill: Here is the proposal: d:Wikidata:Property_proposal/created_during. Ainali (talk) 20:32, 19 February 2022 (UTC)[reply]

Artwork template[edit]

There is a discussion regarding the impact of the "Artwork" template on the population of Structured data tables here. Martinvl (talk) 22:27, 23 September 2020 (UTC)[reply]

Location twice[edit]

A bot just started adding coordinate location (P625) (Schlurcherbot) already if coordinates of the point of view (P1259). It's not good to have both. I thought P1259 is confirmed. (See File:Haltern am See, Sythen, Wassermühle -- 2015 -- 4883.jpg, but I reverted the Modifikation.) --XRay talk 05:50, 6 October 2020 (UTC)[reply]

Ups. May be my mistake. There is an additional object location. --XRay talk 07:03, 6 October 2020 (UTC)[reply]
Yes, some files have both {{Location}} and {{Object location}}. In that case, having both coordinate location (P625) and coordinates of the point of view (P1259) seems to make sense. --El Grafo (talk) 11:01, 6 October 2020 (UTC)[reply]

New property for object location[edit]

See d:Wikidata:Property proposal/Coordinates of depicted place. Multichill (talk) 16:09, 6 December 2020 (UTC)[reply]

The property coordinates of depicted place (P9149) has been created. Multichill (talk) 15:31, 14 February 2021 (UTC)[reply]
There is a Wikidata parameter at {{Object location}}. How to respect this parameter? And what to do if there are more than one {{Object location}} templates (with different Wikidata parameters)? --XRay 💬 07:15, 15 February 2021 (UTC)[reply]

"globe" parameter in geohack[edit]

I don't see a CSD statement corresponding to the "globe" parameter in the geohack links. With that parameter you can specify the celestial object the coordinates apply to (e.g. "earth" (default), "moon", "mars", titan", "io", etc.). This is an obstacle to coordinate extraction from structured data, as we do have a (small) set of images not located on earth. --Dschwen (talk) 16:05, 29 January 2021 (UTC)[reply]

Location rounded[edit]

I'm looking for accuracy of the data, especially set by the template {{Location rounded}}. I don't know how to model this. I can't find a qualifier for setting the value set by "Location rounded". How can this be modeled? --XRay 💬 09:36, 14 January 2022 (UTC)[reply]

@XRay coordinates of the point of view (P1259) already has a way to specify coordinate precision (defaults to "1/1000 of an arc second"). Doesn't allow anything other than degree-based values, though, so you might have to do the math. Or are you looking for a way to store the information "this is imprecise on purpose"? In that case: no idea. We might also want something for Template:Location withheld, and Template:Location estimated. El Grafo (talk) 09:40, 7 July 2022 (UTC)[reply]
I set the precision as you described it. However, it is not obvious for others how accurate the coordinates are and whether they were rounded on purpose. You can only see it indirectly on the map section. I do not find this ideal. --XRay 💬 12:09, 7 July 2022 (UTC)[reply]
@XRay and El Grafo: We have the qualifier sourcing circumstances (P1480) for this. Looking at the possible targets near (Q21818619) seems closest, but maybe better to create to create a new item "rounded" with description "value has been rounded up and some precision has been lost". Multichill (talk) 09:20, 17 July 2022 (UTC)[reply]

Adding locations as properties[edit]

Just as a proposal. It would be useful in terms of writing SPARQL queries that photos would have also a higher level of location data which would be initially derived from category information and or geotags. From a human perspective, you could use it for searching all photos from a certain area which is not currently possible. For example, it is not possible to find all photos from Finland or Estonia using categories because of category loops and leaking to general topics so data normalization is needed.

Properties could be:

The problem with this is of course that they would duplicate information with geotags and P17 is redundant if P131 is defined and there is the cost for up-keeping the data. However, the information is mostly static and values can be validated by software so it is not too big a problem. --Zache (talk) 19:21, 6 July 2022 (UTC)[reply]

@Zache: location of creation (P1071) should be used to describe where a photo was taken. The target should be the most specific location. For example File:Haarlem, Grote Kerk.jpg has Grote Markt (Q1083850). You can travel up the tree to find Haarlem (Q9920)/North Holland (Q701)/Netherlands (Q55).
We're actually already doing that. See File:Entrance Gates at Bagden Hall, Wakefield Road, Scissett, Denby Dale, Near Huddersfield - geograph.org.uk - 3645075.jpg. Which has the property set to produce:
"Denby Dale (Denby Dale→Kirklees→West Yorkshire→Yorkshire and the Humber→England→United Kingdom)" which ends up in the search engine so you can already search for it.
So no, you shouldn't be duplicating data. Multichill (talk) 09:13, 17 July 2022 (UTC)[reply]
You suggest to use the most specific only. On Wikidata there is the guideline that items with a located in the administrative territorial entity (P131) statement should also have a country (P17) statement. This is to speed up querying by avoiding the need for recursive search when searching on this level. GPSLeo (talk) 15:45, 28 January 2023 (UTC)[reply]
@Multichill I mainly meant SPARQL use cases and filtering tools such as petscan. --Zache (talk) 17:43, 28 January 2023 (UTC)[reply]