User:Fæ/OWID

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Our World In Data

[edit]
Global map illustrating the Goverment Transparency Index, OWID

Introduction

[edit]

This is a mass batchupload of the JSON format OurWorldInData datasets and SVG formatted charts that support the site http://OurWorldInData.org. The source for datasets is on GitHub. At the time of this project there are 977 datasets and more are added each day.

As categories cannot be added in data namespace, to list uploaded data files use: this search or see the index of data talk pages at Category:Our World In Data.

To list the 2,698 uploaded SVG files use: this search or look for them in the parent category visually.

At the current time 514 of the 977 datasets could be reformatted to usefully fit within the limited JSON specification of the Wikimedia Commons data namespace.

Configuration

[edit]

Dataset file names are taken from the OWID directory name on GitHub with the unique number identifier ("id") given by OWID in the JSON dataset added in the format:

Data:<title> (OWID <id>).tab

Unfortunately there is no easy way to link the generated SVG charts back to OWID or other data. The naming is therefore based only on the OWID chart title:

File:<title>, OWID.svg

When the chart titles are identical, but SVG charts have different links, a distinguishing number is added based on their position on the charts web page.

File:<title>, <sequence number>, OWID.svg

Technical notes

[edit]

The SVG files are generated with an OWID logo in the top right corner. Per {{Watermark}} and that these may appear too promotional for Wikipedia articles, they are trimmed from the SVG when cached on the local machine using a regex text substitution that removes the "<g>" container for the logo.

SVG workarounds

[edit]

Blacklist

[edit]

As the charts contain medical related language, some are impossible to upload via the API due to the automatic blacklist against certain titles.

WARNING: API error titleblacklist-forbidden: The title "File:Number of people with autism, OWID.svg" has been banned from creation. It matches the following blacklist entry: ".*(?:best|top|with|through|perfect|having|ideal|using|beneficial|effective) autism.*"
...
WARNING: API error titleblacklist-forbidden: The title "File:Share of the population with autism, OWID.svg" has been banned from creation. It matches the following blacklist entry: ".*(?:best|top|with|through|perfect|having|ideal|using|beneficial|effective) autism.*"

Only 3 charts appear to have been banned by filename, these remain skipped.

Dataset workarounds and bugs

[edit]
  • Description must be under a certain string length, arbitrarily this has been cut off at 241 characters in the absence of a specification.
  • OWID fields name, title, id cannot be imported as Commons does not allow extra fields outside of a specification (a Wikimedia spec?)
  • Sources is a single string, whereas OWID has this available as a set.

JSON error syntax

[edit]
11 Access to financial account or services (%) - World Bank (2014) 
 data:Access to financial account or services (%) - World Bank (2014) (OWID 496).tab 
WARNING: API error json-error-syntax: Syntax error
 ERROR 496 Edit to page [[Data:Access to financial account or services (%) - World Bank (2014) (OWID 496).tab]] failed:
json-error-syntax: Syntax error [help:See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.] 

 12 Active tuberculosis - IHME (2017) 
 data:Active tuberculosis - IHME (2017) (OWID 4214).tab 
WARNING: API error jsonconfig-err-array-count: List "data[1120]" has 6 values, but must have 4 values, the same number of values as "schema/fields"
 ERROR 4214 Edit to page [[Data:Active tuberculosis - IHME (2017) (OW...

Blank numbers

[edit]

Fixed by swapping number blanks with null (not as a string) in the data. Refer to mw:Help:Tabular Data.

contenttoobig

[edit]
44 Annual share of CO2 emissions (OWID based on GCP, 2017) 
12/44 data:Annual share of CO2 emissions (OWID based on GCP, 2017) (OWID 3116).tab 
WARNING: API error contenttoobig: The content you supplied exceeds the article size limit of 2048 kilobytes.
Source: https://github.com/owid/owid-datasets/tree/master/datasets/Annual%20share%20of%20CO2%20emissions%20(OWID%20based%20on%20GCP%2C%202017)
May be non-fixable as the limit of 2MB was by design.