Historically, file information about media files on Wikimedia sites has been stored in wikitext. The lack of a database structure with structured data on Wikimedia Commons makes it difficult to search, confusing to users, and impractical for new feature development. Today, many contemporary media repositories use highly structured and machine-readable data - including many media databases in the cultural sector like Europeana and RKDimages. In the meanwhile, Wikimedia Commons relies on a patchwork of plain text data embedded in thousands of overlapping templates and a set of categories, mostly in English only, that are often incompatible with other sites or tools.
For a long time, many members of the Wikimedia Commons community have asked for the implementation of features that require a database-like structure, that let them describe the media on Commons more fully.
For instance, the community has been discussing multilingual categories on Commons for a long time. Without multilingual categories, it is difficult, if not impossible for non-English speaking volunteers and end users to tag and find media on Commons.
Some earlier discussions and thoughts about structured data on Wikimedia Commons and on multilingual categories can be found at
- 2004 discussion
- a 2008 blog post
- the 2009 GLAM-WIKI Recommendations
- another in 2010
- a question on Quora
- the 2015 Community Wishlist
- more recent discussions
In 2012, Wikidata - Wikimedia's free knowledge base and a sister project of Wikipedia - was founded and built upon Wikibase software, which stores versioned structured data in a central repository. Wikibase, supported by Wikimedia Deutschland (WMDE), offers a practical way to maintain structured data in MediaWiki, the software that powers all Wikimedia projects. It is widely considered to be a useful tool to support the growth of the free knowledge movement. Since the inception of Wikidata and Wikibase, many Wikimedia community members have proposed to use this mechanism to store and retrieve media metadata on Wikimedia Commons.
In 2013, the Wikimedia Foundation's multimedia team hosted a number of roundtable discussions with community members, asking what it should focus on in coming years. In each roundtable, the top request from participants was to implement structured data on Commons, even if that topic was not on the agenda to begin with. Some community members pointed out the difficulty of searching on Commons, others pointed at the lack of multilingual categories. Many suggested that categories could be complemented with more granular topics that could be linked to Wikidata's multilingual knowledge base.
In 2014 the Wikimedia Foundation started to explore the concept of structured data to address these concerns. The team identified most of the core architectural features that were needed to improve Wikimedia Commons, and discovered that these features would fit well in the roadmap of Wikibase (described above). At this point, the 2014 project was slowed and delayed until Wikibase offered more robust infrastructure. In 2016 this point was reached with an initial demonstration of how structured data on Commons could work: a first demo of so-called mediainfo entities, a new entity type.
Developments from 2016
In October 2016, WMF and WMDE announced a funding agreement that would provide multi-year support for Wikidata, including backend support for integrating Wikidata into Wikimedia Commons. This funding agreement was supplemented in late 2016 by a $3 million external grant from the Alfred P. Sloan Foundation, which makes it possible to develop structured data functionality in Wikimedia Commons in an accelerated three-year period (2017-2019).