Commons:SPARQL query service

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Pictogram voting info.svg This page needs to be updated with how Commons uses the service.

Wikimedia Commons Query Service Logo.svg

Link: Wikimedia Commons Query Service

Updates[edit]

30 September 2020
Feb 3 2021
  • Our host wcqs-beta-01.eqiad.wmflabs is running low on disk space due to its blazegraph journal dataset size. In order to free up space we will need to take the service down, delete the journal and re-import from the latest dump. Service interruption will begin at Feb 4 18:30 UTC and continue until the data reload is complete, approximately 2.5 days.
Feb 8 2021
Mar 14 2021

We'll be performing brief maintenance of Wikimedia Commons Query Service (https://wcqs-beta.wmflabs.org/) beginning at 2020-03-15 Mon 16:00 UTC. We expect service availability to be restored very quickly - on the order of 30 minutes or so.

Release notes[edit]

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

  • The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.
  • The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime.

The data will be reloaded weekly on Mondays from dumps taken on Sunday. The dumps can be seen at https://dumps.wikimedia.org/other/wikibase/commonswiki/. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows.

  • Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (T258507 and T258474)
  • The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
  • Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
  • Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
  • WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:
  • URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by :* Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.
  • If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
Future plans

We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.