Commons talk:Structured data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days.

Talk pages of subpages and archives

Structured data and Wiki Loves Monuments[edit]

I've been adding structured data to files uploaded as part of Wiki Loves Monuments. Over the years about 2,4 million files have been contributed to Wikimedia Commons. About 2,1M of these files now have a statement. On Commons:Wiki Loves Monuments/Structured data I documented what kind of data is added to these files. Recently my robots have been focused on adding basic relatively easy to extract data like source of file (P7482), creator (P170), copyright status (P6216), copyright license (P275), inception (P571), coordinates of the point of view (P1259) & coordinate location (P625).

I'm probably going to focus a bit more on depicts (P180) and location of creation (P1071) now (see also User:ErfgoedBot/Depicts monuments.js). I use the monuments template to get the identifier and based on the identifier I find the relevant item. So for example File:Haarlem - Nieuwe Gracht 62.JPG has {{Rijksmonument|19594}} and that gives me Q17254716 (Q17254716). This I've already done for most of the Netherlands, but I'm open for suggestions for other countries to do to. Any countries that have good coverage on Wikidata and a decent amount of photos here I should work on? For adding the location of creation (P1071) I can use the item too. I have three options:

  1. Adding the same as depicts (Q17254716 (Q17254716)). That would only be correct in this case if the photo would have been taken inside
  2. Adding the street which is in located on street (P669) (Nieuwe Gracht (Q17195901)). This happens to be correct for this photo, but often this won't be correct
  3. Adding the municipality which is in located in the administrative territorial entity (P131) (Haarlem (Q9920)). This should be correct almost always, we do loose a bit of detail.

Given the numbers (Netherlands alone is over 400.000 photos) manual review would take forever. I'm leaning towards adding the municipality because it's always correct. Another option is to add both the street and the municipality and leave it up to the user to remove one of the two. Removing is much faster than having to look up things to add, but it's still a lot of effort. Any opinions? Multichill (talk) 21:24, 25 February 2020 (UTC)

Multichill, Why location of creation (P1071) instead of regular location (P276) property? Is that related to properties of the photograph vs. properties of photographed object, ir is it something else? I would vote against adding location properties which duplicate located on street (P669) or located in the administrative territorial entity (P131) properties. I think those are more specific and clear. I can see a need for location of creation (P1071) / location (P276) if it is more specific than other location info. I wonder if there is some way of specifying item ID of the monument, the way we use digital representation of (P6243) for artworks. Depict statements are not that great for that, since if there is a cat or car in a photo than we might be depicting cat or specific car model as well. Some of those location statements should be already on Wikidata. Would we duplicate those or rely on Wikidata copy? Sorry, I seem to have more questions than answers. --Jarekt (talk) 16:22, 26 February 2020 (UTC)
@Jarekt: location of creation (P1071) is the right property to indicate where a work was made. location (P276) indicates where a work is at some point in time. See File:De Moulin Rouge in Parijs bij avond, Bestanddeelnr 254-5695.jpg for an example where I used the different location properties.
I'm under the impression that multiple concepts get mixed up in the questions. Please have a look at Commons talk:Structured data/Modeling/Location#Types of locations. I hope that makes it clearer. Multichill (talk) 17:27, 27 February 2020 (UTC)

Structured data for WLM in the UK[edit]

Based on your criteria (good coverage on Wikidata and a decent amount of photos) you could try working on the UK photos - though bear in mind that the Wikidata entries come from four separate official listing databases (England, Scotland, Wales, Northern Ireland), which use different listed building/scheduled monument IDs. MichaelMaggs (talk) 19:14, 26 February 2020 (UTC)
@MichaelMaggs: As long as each has a unique property / template pair, that shouldn't be a problem. For each source, can you provide:
  • property - Property id on Wikidata
  • template - The template here on Commons
  • designation - The heritage designation (P1435) to filter by on Wikidata (optional)
The rest of the fields listed at User:ErfgoedBot/Depicts monuments.js I can figure out myself based on this info. Multichill (talk) 17:27, 27 February 2020 (UTC)
Thanks Multichill. Here are the UK WLM Campaign -> Wikidata mappings that are currently known to me. They should I think work, but I believe that there was some updating and fixing last year which I wasn't involved with.
For quick reference, you will find the UK winning images at Commons:Wiki Loves Monuments 2019 in the United Kingdom/Winners.
As I've decided to step down from organising the UK contest, I won't be in a position to follow up or help out further with this, I'm afraid. It's likely that the contest will run again this year, as normal, either with a new lead volunteer or perhaps one of the staff at Wikimedia UK. If you need more details User:Nev1 at WMUK should be able to put you in contact with the right person. All the best, MichaelMaggs (talk) 15:32, 6 March 2020 (UTC)
@MichaelMaggs: thanks for the pointers. I added Wales and Northern Ireland. I already had England. Scotland I can't handle right now because it uses multiple templates based on {{Historic Scotland listing}}. That's something I should look into supporting at some point. Multichill (talk) 19:28, 6 March 2020 (UTC)
@Multichill, MichaelMaggs: I'm so sorry I missed this conversation. Is there anything needed at this stage? Richard Nevell (WMUK) (talk) 11:13, 24 April 2020 (UTC)

Structured data for WLM in the Czech Republic[edit]

Great work @Multichill:! Try Czech Republic, there should be a good coverage with decent amount of pictures. --Juandev (talk) 07:30, 5 March 2020 (UTC)

Thank you. I agree about the good coverage and amount of images that's why Czech Republic is on the list. Multichill (talk) 19:31, 6 March 2020 (UTC)

On SPARQL[edit]

@Jheald, Jarekt, Multichill, Jean-Frédéric, Jura1: plus all others interested in the SDC query service.

Disclaimer, my personal knowledge and understanding on this topic is pretty limited, so bear with me here as we work through this.

As you may be aware, the Wikidata Query Service (WDQS) is not run by the Structured Data team at the Wikimedia Foundation. The WDQS team underwent some changes towards the end of last year, and they're still working to get back up to speed with Wikidata and looking towards the future of WDQS support. During this time, Ramsey has still been working to get the SDCQS up and running. Unfortunately due to the aforementioned other issues, the progress has been slow (as you're certainly aware). Ramsey's still looking into ways to move things along, and something that might be helpful is if we can hand the WDQS team some scoping for the basic features needed here; the "minimal viable product" as it's known in the business.

I think that there are two main areas around surfacing data that need addressed from the beginning. Please correct me if I'm wrong, or if this is missing a point or two or three:

  • Maintenance and administration use-cases
  • Exposing data connections for other tools to build on

Support for Commons queries will be built up gradually as resourcing allow the project to scale up. Assuming these points, what's the basic support needed here? Can a simpler system that provides search on key-value pairs would suffice for their use cases for now while the services are being scaled? Feel free to provide examples, if you have them. And please ask questions, I'll do my best to address them or get Ramsey to answer where he can as possible. Keegan (WMF) (talk) 20:50, 13 March 2020 (UTC)

I think the priorities would be that
  1. runs again
  2. and gets fed regularly from live data (maybe not as quickly as Wikidata; which I think was never the case).
  3. Also, output as "image grid" didn't work, as no link to the file was included (previous comments)
  4. Federation with Wikidata would be good to have too (I don't think that was set up either, but I might be wrong), i.e. run queries on that combine data with data from Wikidata.
  5. Federation on Wikidata should work too, i.e. run queries on that include data from Commons Structured Data.
  6. The mwapi service for requests to the MediaWiki api should be up too.
  7. Some users use another interface than the web interface, maybe having that up too would be good for them.
The above is (my view) by order of priority. Supposedly will be replaced with some other url. The advantage of having Blazegraph running is that any maintenance query or output could be handled. Jura1 (talk) 10:25, 14 March 2020 (UTC)
I agree with many points Jura1 raised and would like to add some other ones:
  • We need to be able to run a lot of constraint related queries, most of them are the same as the constraints on Wikidata.
  • Federation with Wikidata is a must. If I want to check copyright status of a photo of an an sculpture the copyright for the photo will be on Commons while information on the sculpture will be on Wikidata. Also many constraint queries might require access to Wikidata. Another example query would look up all the wikidata items used in some property which are redirects.
  • display of results as image grid or on a map would be great
  • the queries connecting to wikidata should be able to look up items which link to given file through image (P18) property (or other similar ones)
  • Another capability would be ability to limit the search to files in some category or with some template, ideally federation with SQL server, or something like in this query.
--Jarekt (talk) 03:50, 15 March 2020 (UTC)
@Keegan (WMF): the MVP is a working SPARQL endpoint. Not much you can strip off to make it more minimal. was a nice prototype. From my point of view you need:
  1. Production infrastructure to replace it
  2. Regular RDF dumps which you need to feed to the query engine (bootstrap)
  3. Incremental RDF feed to keep the data up to date
That's probably the most minimal you can go. If you try to strip down any of these points you either get really poor performance or outdated data. If you're already doing this, it's a minimal step to enable all the bells and whistles we have on . Actively disabling these is probably even more work than just leaving these enabled. Just put up a disclaimer that some things might still be broken. Multichill (talk) 14:18, 15 March 2020 (UTC)
+1 to what most people have said here. Also, the current way of using 'haswbstatement' in the search engine is already a pretty good 'simpler system' that just does key/value-pairs. Huskyoog.jpg Husky (talk to me) 00:24, 16 March 2020 (UTC)

I think this thread is prompted by this recent comment by User:Gehel on phabricator ticket T221921:

Some of the use cases described here are already supported by search (wbstatement keywords, etc...). We are not going to work on a new SPARQL endpoint before we have a scaling strategy for the current WDQS. It looks like the remaining use cases described here might be better served not by a SPARQL endpoint, but by a more specific service.

In my view the comment above is based on a misperception. WDQS is currently pulling about 5 million queries a day [1], and struggling to keep up with managing them together with data update. I do not believe there will be anything like the same massive external demand for Commons SPARQL Query Service (CQS) in the short term, because it does not serve the same generic content;nor IMO would it be the purpose of a naive SPARQL endpoint will appropriate to support primary end-user search and discovery, such as the holy grail of faceted search -- there wouldn't be either the scaling or the responsiveness, IMO.
What CQS is IMO invaluable and irreplaceable to support is a fairly small subset of Commons and SDC power users, who will use it to understand
(1) how properties are being used, in particular the interactions between different peoperties, and how different properties are being used together, to grow and refine the data model in the right way. At the moment we are flying blind and it is massively holding back the critical & urgent task of understanding and refinement of the data modelling.
(2) queries to drive maintenance of properties and statements that are not being used appropriately. Usually identifying the bad statements requires finding particular combinations of statements, which SPARQL is perfect for.
(3) queries to identfy gaps in data that can be filled for particular subsets -- to be compared with other sources (eg categories, descriptions), that power users that then use to fill statements in particular sub-areas in a targeted way, (cf Wikidata: identify gaps, fill with quick=statements; or Wikidata: data extract from source, compare with existing over some specific group of items, add missing data -- *key* workflows for Wikidata work.
(4) prototype proofs-of-concept for faceted search approaches and other tools and demos (cf Crotos). As demonstrations of what may become possible, not for volume use. (And if they did get over-exposed, it would be potentially possible that such demos could have their use throttled - eg by requiring an API key & limiting it).
In the early years of Wikidata, there was a system called WDQ by Magnus that allowed retrieval of particular combinations of property-value pairs, similar in some ways to what Keegan suggests in his intro above. (cf [2]). WDQ got Wikidata off the ground (and perhaps was more resource-efficient). But I believe such a stopgap would be a waste-of-time dead end for CQS, because:
(1) SPARQL is far more elegant, easy to read, and easy to understand -- and is *what the community knows*. It's crazy to split between 1 system for Wikidata and another for CQS.
(2) A SPARQL solution is available out-of-the-box for Wikibase.
(3) Many many many queries will rely on information both from Commons and from Wikidata. In SPARQL this is a built-in feature, using federation. Any other stopgap would have to develop such linkage from the ground.
In my view a SPARQL CQS system is desperately needed now, or even better last year, and putting roll-out on hold is insane for the project. Jheald (talk) 12:03, 16 March 2020 (UTC)
+1 on the scaling issue. Wikidata/WDQS has some specific issues, not least continuous import of academic paper items, often huge; and stars & whetever other catalogues are being raided. I think there's perhaps much less scope for this on commons. The JSON -> RDF export is known to be less efficient than it might be, and this is particularly associated with large JSONs - the academic papers with very many authors and/or citations; the chess players with 1,000s of ELOs; cities with hundreds of population statements. I see less scope for such hugeness in commons structured data; and it's probable that in the timescale commons structured data ramps up, WMDE will deliver JSON-RDF improvements. Finally I think less demand for reports from a commons structured data report service, than for WDQS. Obvs, all famous last words.
I'm basically where Jheald is on points (1) - (3) immediately above, endorse his "insane" and prepend "absolutely expletive". --Tagishsimon (talk) 17:18, 16 March 2020 (UTC)
Investing in any system which is not capable of verifying that property constraints are followed, would be a waste of time. --Jarekt (talk) 19:52, 16 March 2020 (UTC)

Thanks for everyone's feedback so far. We are fully aware of the significance of the query service. Our desire is to provide something useful for the hard-working volunteers contributing the data, and we acknowledge this is taking a very long time and it is frustrating for us as well.

We're discussing this internally and exploring a variety of possible solutions. We hope to provide some ideas for moving forward soon.

In the meantime, work on remaining SDC front-end features will continue. The actual Structured Data team that is assigned to solely work on this project is mostly a front-end team (which is why much of the recent work has been front-end). The backend infrastructure work is distributed among several specialized teams that split their time and resources across multiple critical projects. We'll keep you updated as things progress. Thanks again. RIsler (WMF) (talk) 20:31, 19 March 2020 (UTC)


Hello all,

I'm sorry about the delay, extenuating circumstances pushed back some of the discussions needed to move forward. The good news is that the discussions have been had, and the importance of SPARQL has been conveyed and accepted.

Copying over what's been put in the Phabricator task:

  • The work to create a SPARQL endpoint for Commons has been re-prioritized [moved up in importance]. Our teams will be working on it over the next few months and the search team is currently estimating the work involved.
  • The first release will be a beta endpoint that will be updated via weekly dumps. Caveats will include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.
  • We do plan to move this to production, but we don't have a timeline on that yet.
  • The SPARQL endpoint for Commons will be restricted via a light and unobtrusive form of authentication, so that we can contact abusive bots / users and block them selectively (as a last resort) when needed. More details on this to come.
  • We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long term solution, it will only be part of that solution. Even once the SPARQL endpoint is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions — once those solutions exist.

Two additional points not on Phabricator:

  • CBogen, the team's new program manager, and I plan on providing updates on this at least every two weeks on the task and here.
  • Constraints for SDC have been deployed, and should be functional when the endpoint is stood up (@Jheald:).

Thanks all, we'll keep you posted. Keegan (WMF) (talk) 15:32, 30 April 2020 (UTC)

Structured data about Tabular data?[edit]

As far as I can see, the Structured data efforts have a strong focus on things in the File namespace, but is there also some tiny corner concerned with COM:Tabular Data? I think I had seen this discussed a while back but could not find anything relevant right now, so posting anew. -- Daniel Mietchen (talk) 02:37, 17 March 2020 (UTC)

@Daniel Mietchen: I don’t see any reason to store anything about tabular data pages in Wikibase. Traditional file description pages use wikitext, which is easy to use for humans, but hard to process for machines. Tabular data is structured in itself, so the page content itself can be easily read by machines, there’s no need to store anything separately in Wikibase. —Tacsipacsi (talk) 00:30, 18 March 2020 (UTC)
There have been some ideas about making it possible in WDQS to query those datasets. I think that would be useful, but more likely to happen from the WDQS side than the Common Structured Data side. —TheDJ (talkcontribs) 09:33, 18 March 2020 (UTC)
@TheDJ: Thanks, that makes sense. But probably we should have a working SPARQL endpoint for SDC before working on it… —Tacsipacsi (talk) 00:53, 19 March 2020 (UTC)
  • Yes, I think it should be added as well. It seems odd that metadata about these would be at Wikidata while for images it's here. Jura1 (talk) 18:21, 27 March 2020 (UTC)

"Depicts tradition"[edit],_Korean_folk_dance_22.jpg&diff=407971934&oldid=394068302: didn't we have an understanding that until further notice, depicts statements are supposed to be confined to the relatively concrete and clear? Depicting tradition (Q82821) seems to me to be nothing of the sort. - Jmabel ! talk 00:34, 30 March 2020 (UTC)

Its kindof bizarre: I think since the tool is suggesting the concept, people are shortcutting to that as "good enough" without figuring out the right thing. I have made some progress on culling tradition (occurrence, mode of transportation, architecture), and a number of the other vague concepts through Petscan + Quickstatements. Some of the clusters of really bad "tags" in depicts are starting to look much healthier -- but it took a fair amount of work to make sure that we are provide accurate depicts statements. Sadads (talk) 10:10, 31 March 2020 (UTC)
Right. The question is why the tool is suggesting the sort of tags we agreed not to use for now. Those should simply be off limits. - Jmabel ! talk 16:15, 31 March 2020 (UTC)
I'm working on getting the tool's blacklist posted this week, with the list comes the opportunity to add more properties to it as the community requests them (using the talk page or some other light-weight process). Keegan (WMF) (talk) 16:33, 31 March 2020 (UTC)

Using "color" for dominant color in an image?[edit]

So this is a bit of a modeling question: when we have dominant colors in an image, i.e. any of these identified by the Computer Aided Tagging tools as red, should that be stored as color (P462)? I think it would make sense to leave this facet around for dominant color, just like we would on Wikidata, or for a depicted item here. Being able to query by dominant colors would allow for some really neat things, like generating Category:Photomosaics, Sadads (talk) 10:14, 31 March 2020 (UTC)

Computer-aided tagging blacklist posted[edit]

I've published the blacklist page, with the initial included properties: Commons:Structured_data/Computer-aided_tagging/Blacklist.

Requests/suggestions can be made on the talk page using whatever kind of process the community would like. The team will patch in new additions as they come up. Keegan (WMF) (talk) 17:39, 1 April 2020 (UTC)

Bug in haswbstatement search[edit]

I have found a couple concepts that are currently tagged in files like Q1802779 (Q1802779) that have become redirects on Wikidata (see, however, because of the lack of q number in the interface, if you try to search for "Land vehicle" it misses those items. We probably need some type of maintenance report so that these can be fixed in the future, and/or a change in the way search indexes (so it searches for all the redirects as well). Sadads (talk) 23:43, 2 April 2020 (UTC)

Thanks, I'll pass it along and get a bug report up as needed. Keegan (WMF) (talk) 21:31, 3 April 2020 (UTC)

haswbstatement search via API[edit]

Hi, I checked many places .... I want to do this reuqest:

via an API, i.e. I would like to have a json.

In wikidata I do:

curl ''

so I would expect:

curl ''

but this is not working ... I check a while and I really do not find a solution. Any ideas?

@DD063520: The default namespace of list=search is always 0, i. e. the main namespace (Gallery on Commons). This is unlike Special:Search, which has a configurable and usually more useful default. This works: --Lucas Werkmeister (talk) 00:43, 9 April 2020 (UTC)
@Lucas Werkmeister: Thank you!!!!!!!!! Perfect!
@Lucas Werkmeister: Sorry for bothering again .... For some specific images we would like to extract the structured data section ... any format is fine (if it is RDF even better), is there an API for that too? I couldn't find it .... — Preceding unsigned comment added by DD063520 (talk • contribs) 11:52, 10 April 2020 (UTC)
@DD063520: You can feed the page IDs from the search into either action=wbgetentities (JSON only, but allows you to get several entities per request) or Special:EntityData (any format), e. g. Special:EntityData/M15925090.ttl. (Side note: pings only work if you add a signature ~~~~ in the same edit.) --Lucas Werkmeister (talk) 13:32, 10 April 2020 (UTC)
@Lucas Werkmeister:, wonderful! : ) — Preceding unsigned comment added by DD063520 (talk • contribs) 07:41, 16 April 2020 (UTC)


Is P2701 going to be added to images by bots or is it considered redundant? 1234qwer1234qwer4 (talk) 18:30, 6 April 2020 (UTC)

Not on top of my list, but given that we have Category:Images by file format, we'll probably add it at some point. Probably just like the category first to the bit less used formats. Should probably done in a single edit per file with a lot of other metadata. Multichill (talk) 19:06, 6 April 2020 (UTC)

P180 depicts Flores hawk-eagle[edit]

My watchlist is full of structured data updates, all of which claim to be Flores hawk-eagles. As an example, this edit on the edit summary says "Created claim: depicts (P180): railway (Q22667)", but on my watchlist it shows up as "Created claim: depicts Flores hawk-eagle (P180): railway (Q22667) Tag: Computer-Aided Tagging". What's going on? -mattbuck (Talk) 16:53, 21 April 2020 (UTC)

@Mattbuck: Someone briefly changed the English label for depicts (P180): d:Special:Diff/1161800693. --bjh21 (talk) 17:13, 21 April 2020 (UTC)
That explains it, thanks Bjh21. -mattbuck (Talk) 17:17, 21 April 2020 (UTC)

Abuse filter for labels[edit]

IMO the modifying of label (and descriptions) should show a hint, if more than 50 % of the characters will be removed or if emojis are added. --XRay talk 09:47, 25 April 2020 (UTC)

Redirect and structured data[edit]

This does not seem to make sense: . Bug? --2A02:810D:6C0:2FB0:5AF:A0FD:D85D:5063 09:03, 26 April 2020 (UTC)

  • It would help a lot if you would say what about this does not seem to make sense. - Jmabel ! talk 17:58, 26 April 2020 (UTC)
See headline. Structured data for a redirect does not make sense IMHO --2A02:810D:6C0:2FB0:4088:F34E:3A4:AE61 18:21, 26 April 2020 (UTC)
Answered at Commons:Forum#Strukturierte_Daten_bei_Weiterleitung. Multichill (talk) 18:33, 26 April 2020 (UTC)
Can't find information there. --2A02:810D:6C0:2FB0:B8BD:D76B:D60C:7311 19:10, 17 May 2020 (UTC)

stability of proposal of SD?[edit]

Same set of images, same category, different proposed SDs.

(I cannot see in the history, what the bot proposed - another flaw). I assume, that users do not add missing SDs, but only delete unsuitable SDs. So why does the first object use the WD-item associated with the Commons category (which seems to be ok), and the later doesn't. Is the algorithm stable? Or does it produce arbitrary results? --Herzi Pinki (talk) 12:34, 28 April 2020 (UTC)

Hello @Herzi Pinki:. The Suggested Tag feature does not use categories to suggest depicts statements. Categories are displayed within the tool as a guide to help users understand the context and content of the image so they can choose tags accordingly, but the suggestions come from a Machine Vision analysis tool that looks at the content of the image itself. If you'd ever like to see a log of what tags were suggested for an image, append ?action=info to any File Page URL and scroll to the bottom of the page to see Suggested Labels RIsler (WMF) (talk) 23:40, 28 April 2020 (UTC)
@RIsler (WMF): Thanks. Thanks for the info about action=info. So the assignment of SD Q37897818 (Q37897818) in the first image, which is the by far very best describing the subject (which is underneath the soil), was done by the user. Not by the automatism.
BTW, the link on the info page yields a bad request. --Herzi Pinki (talk) 05:40, 29 April 2020 (UTC)

automatic update of SDs[edit]

If the mechanism proposes SD based on categories and the categories turn out to be wrong or are changed for other reasons, who is in charge to synchronously fix also associated SD? --Herzi Pinki (talk) 12:36, 28 April 2020 (UTC)

As mentioned in the reply above, categories are currently not used to suggest structured data, they're only displayed for reference. RIsler (WMF) (talk) 23:42, 28 April 2020 (UTC)

duplicate entries[edit]

see File:Hügelgräberfeld_Eggforst_01.jpg, motif grave field (Q2593777) is a duplicate. Does this make sense? What is the meaning of adding a motif twice? (e.g. if an image shows three buildings, shall there be three different motif-entries building (Q41176)?) Shouldn't there be a check to add each motif only once? Can some bot care for the cleanup? best --Herzi Pinki (talk) 12:57, 28 April 2020 (UTC)

  • In case multiple same objects are shown, the “quantity” property should be used. 1234qwer1234qwer4 (talk) 14:38, 28 April 2020 (UTC)
    • ... unless the objects need different qualifiers. :-) --Marsupium (talk) 15:17, 28 April 2020 (UTC)

so this is not the case here. created CR: Can the software prevent situations like this and can some bot clean up the mess? best --Herzi Pinki (talk) 15:37, 28 April 2020 (UTC)

Issues with SDC mass uploads[edit]

@SandraF (WMF), Keegan (WMF):,I do not know if that is on anybody's radar but phabricator:T246746 and phabricator:T245349 issues are big handicaps for working with SDC. I am using QuickStatement tool for a lot of SDC statements, I am logged is as user:JarektBot and account I am using so my edits are marked as bot edits and do not flood peoples watchlists. Unfortunately I am due to issues reported in phabricator:T246746/phabricator:T67494, my bot edits are not marked as such. This might not be SDC issue, but it is an annoyance to many and I am trying to avoid annoying people with my SDC edits. Phabricator:T245349 / phabricator:T237991 is also a big problem. Since phabricator:T221921 is still unsolved we are relying on maintenance categories to know which file still need statements, see Category:Structured Data on Commons tracking categories, often added with help of Module:SDC tracking. Unfortunately adding SDC statement does not trigger a page refresh, and does not remove files from the category. The only solution is to run "touch" / purge operations on those files which is much slower than adding statements. Any chance, one of you can reprioritize some tasks to get those done? --Jarekt (talk) 17:46, 28 April 2020 (UTC)

@Jarekt: I'll check with Ramsey and see where we are with these tasks. Keegan (WMF) (talk) 18:14, 28 April 2020 (UTC)
A ticket's been made to investigate the cache issues, the bot edits require a further look into existing tickets. There should be further updates in the tickets on Phabricator in the near future. Keegan (WMF) (talk) 16:36, 4 May 2020 (UTC)
Thank you. Unmarked bot edits to SDC seem to be an irritant to a lot of people. I will try to complete the task of adding OTRS IDs to SDC, but I will not start any more tasks until the issue is resolved of I master some other ways of mass editing. --Jarekt (talk) 17:09, 4 May 2020 (UTC)

Constraint violations database reports[edit]

@CBogen (WMF), Keegan (WMF), Jheald: I do not know if this was discussed anywhere else, but I was wandering about ways we are or are going to track SDC constraint violations. I know we are waiting for phabricator:T230314 and SPARQL database queries, but I was wandering about other parts of the system. Each property has a page on Wikidata, like creator (P170) and that page is a central point for all the links to pages related to the property: that is where we store constraints and that is where we have link to d:Wikidata:Database reports/Constraint violations/P170 (this link is actually on the talk page). So here are some questions:

  • Are we going to have some constraints relevant only to SDC which are different from Wikidata constraints? If so how are we going to model that?
  • Some types of constraints do not require SPARQL database query (as explained in phabricator:T230314 by User:Lucas Werkmeister). I am not sure if any of them apply to SDC, but if so are there any pages on Commons or Wikidata showing SDC constraint violations for such cases?
  • Maybe we need some system of pages on Commons which list properties related to Commons, where we can discuss SDC issues related to those properties and can use as a hub for links to constraint violations database reports, etc. I do not know if Wikidata property pages allow sitelinks but if they do than we could connect them. --Jarekt (talk) 17:54, 4 May 2020 (UTC)
FWIW, I know little to nothing about the mechanics of this topic itself. I'll see what I can find out if there's a question related directly to the developers. Keegan (WMF) (talk) 16:40, 11 May 2020 (UTC)
Keegan, Thanks for replying. I guess what is confusing is that on Wikidata constraint violations infrastructure is a bit of a patchwork of MW software, user controlled bots, and occasional extensions and javascripts. As a result, it is really hard to transplant it to Commons. Who does what is not easily transparent, either. So I was trying to start a discussion about how we are going to bootstrap such infrastructure. I do not know a whole lot about it either, but I am trying to wrap my head around some of those issues. If I had a single question to the developer team would be to document what exactly will be provided by MW software once SPARQL queries are live and phabricator:T230314 finalized. That way, we will know what work has to be done by the community, if we want constraint violations infrastructure similar to the one we are use to on Wikidata. --Jarekt (talk) 17:00, 11 May 2020 (UTC)
Sounds good, I'll make sure that's shared and happens. Keegan (WMF) (talk) 17:46, 12 May 2020 (UTC)

By the way, I’m not sure if Ivan A. Krestinin’s C++ bot (which updates the constraint violation pages) actually uses Wikidata Query Service. Unfortunately the source code is still not public (as far as I know). —Tacsipacsi (talk) 01:22, 5 May 2020 (UTC)

Help needed from javascript speakers[edit]

See MediaWiki_talk:Gadget-PermissionOTRS.js#Add_P6305_SDC_statement. --Jarekt (talk) 14:28, 8 May 2020 (UTC)

Wikidata Wochenende taking place online[edit]

Hello all,

A few months ago, I told you about the Wikidata Wochenende (de), a week-end dedicated to working on Wikidata-related projects where we would love to have Commons editors, for example working on Wikidata-powered templates or Structured Data. The event, initially planned in Ulm, will take place entirely remote on June 12-14. It's going to be a mix of hackathon and workshops, where people can connect, work on their projects, and learn more from others.

If you're speaking German and interested in attending, please register in the next few weeks. Cheers, Lea Lacroix (WMDE) (talk) 08:37, 19 May 2020 (UTC)

Structured Search[edit]

Hey everyone, i've released a tool that some of you might find of use. It's called Structured Search and provides another user interface to the Commons search engine. I developed this tool for two reasons:

  • To provide a friendlier user interface to show the richness and beauty of all the wonderful free content that is available here.
  • To showcase the possibilities of Structured Data on Commons.

I've made it so that it's easy to search for other images that have structured data: try clicking on any image on the first page that you get, then look for ‘depicts’ statements in the image detail pane. There are also options to search for categories and exporting queries to PetScan. For those of you who want to access the tool from a regular Commons search results page i've made a little userscript.

Check it out here. Huskyoog.jpg Husky (talk to me) 20:52, 24 May 2020 (UTC)