Commons:Structured data/Project glossary

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search


Gnome-preferences-system.svg
This page is a work in progress page, not an article or policy, and may be incomplete and/or unreliable.
Please offer suggestions on the talk page.

Deutsch | English | Español | Suomi | Français | Magyar | Italiano | 日本語 | Македонски | Nederlands | Português | Português do Brasil | Română | Русский | Sicilianu | Українська | العربية | +/−


This glossary defines concepts that are often used when talking about the project Structured Data on Wikimedia Commons. See Commons:Structured data/About for a general introduction about the project.

Glossary

API. An API (Application Programming Interface) is a set of functions, procedures and tools that make it possible to build applications (software) that uses the content and/or features of another service. For instance, the API of Wikimedia Commons provides the media and data in Wikimedia Commons in a way that makes them easily re-usable by external software. Every MediaWiki wiki (such as any Wikipedia, Wikimedia Commons) offers an API. Wikibase, the software behind Wikidata, offers an API too, but with more detail. Therefore, when Wikimedia Commons will be enhanced with structured data via the Wikibase software, its API will become much more advanced. For a general introduction into the Wikibase API see https://www.mediawiki.org/wiki/Wikibase/API

Curating, or curation, in the narrow sense, means: selecting and organizing pieces of artwork and cultural heritage for an exhibition. This term can also be used more broadly, to describe any activity that involves selecting, organizing and presenting information. In the context of Structured Data on Commons, curating means all the activities performed by Commons contributors to organize and present the media files on Wikimedia Commons: administrative actions, adding more and better metadata to the files, grouping them in categories, creating galleries...

Data model. A data model is a model that organizes elements of data, and that standardizes how they relate to each other and to the real world. When we use this term in the context of the project Structured Data on Wikimedia Commons, we mean the basic building blocks that constitute a MediaInfo item. At the time of writing this glossary (October 2017) this data model is not finalized yet, but it will probably be comparable to the basic building blocks of a Wikidata item:

Datamodel in Wikidata.svg

On top of this basic data model, the Wikimedia Commons and Wikidata communities can work together to build an ever-growing and -improving ontology to describe media files.

For a (draft) introduction to the basic data model of Wikibase - the software behind Wikidata and Structured Commons - see https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer

Federation. In a technical sense, a federated database system is a management system where multiple autonomous databases work together in a single, so-called federated, database. Wikibase Federation is implemented for Structured Data on Wikimedia Commons: it makes it possible to use entities (Items and Properties) defined on one Wikibase repository (i.e., Wikidata) on another Wikibase repository (i.e., Wikimedia Commons). https://en.wikipedia.org/wiki/Federated_database_system https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#New_step_towards_structured_data_for_Commons_is_now_available:_federation

GLAM logo, square, transparent.png

GLAM stands for Galleries, Libraries, Archives and Museums - or cultural and knowledge institutions. Many cultural institutions upload media files about their collections to Wikimedia Commons. In the project Structured Data on Wikimedia Commons, we work together closely with GLAMs around the world to learn about, and incorporate, their requirements for structured metadata in a media repository.

Linked Open Data (often abbreviated as LOD) is structured open (freely licensed) data that has been published on the web in such a way that it can be easily interlinked with other datasets on the web, and can be queried. Linked Open Data is designed in such a way that it can be read by humans and machines. Structured data on Wikimedia Commons will be Linked Open Data as well.

https://en.wikipedia.org/wiki/Linked_data

Machine-readable data is data that can easily, and consistently, be read by computers, and that can easily be processed via computer programming logic.

http://opendatahandbook.org/glossary/en/terms/machine-readable/

Metadata is data that provides information about other data. Metadata of a media file may include (among other things) information about its creator, copyright status, what is depicted in the file, and its creation date. https://en.wikipedia.org/wiki/Metadata

Screenshot of a test page for a MediaInfo entity

MediaInfo is a new entity type for Wikibase, that is able to handle structured metadata for multimedia files.

The extension hooks into a file description page and adds a link to a MediaInfo page storing supplemental metadata about the file. This may, for example, include the author, detailed license information, and the concepts that a picture actually depicts.

Further information: [Mediawiki extension WikibaseMediaInfo]
MediaWiki-notext.svg

MediaWiki. Wikimedia Commons operates on MediaWiki, the same software that powers Wikipedia. MediaWiki was primarily developed for writing and hosting text. In the Structured Data on Commons project, MediaWiki is enhanced with the Wikibase extension, which allows for the integration of structured metadata into the file descriptions on Commons.

Multi-content revisions. So-called multi-content revisions form an important building block for structured data on Wikimedia Commons (and on other Wikimedia projects). Multi-content revisions are groundwork to make information in Mediawiki wikis technically more straightforward to organize. The current wikitext pages will be able to be split out into separate documents (slots) with different functionality (such as infoboxes, categories, template documentation); these different slots can then be integrated into one page, sharing page-level functionality and one shared history. Specifically for structured data on Wikimedia Commons, multi-content revisions make it possible to store a structured data entity (an item, a property, a MediaInfo entity) and wikitext in the same page. Structured Commons is a major use case for multi-content revisions.

https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions https://phabricator.wikimedia.org/T107595

Ontology. In this project, the term ontology is used in its meaning in the context of information science: an ontology is a common vocabulary for a domain of knowledge, used for organizing information in that domain (for instance the visual arts, or the description of books in a library). An ontology contains machine-readable definitions of the basic concepts in that domain and of the relationships that are possible between them. On Wikidata, volunteers in WikiProjects often agree upon ontologies for their own domain and describe these together. In the context of the Structured Commons project, the Commons and Wikidata communities work together to build an ever-growing ontology for describing media files on Wikimedia Commons. They do this on top of the basic data model that is provided via Wikibase in the MediaInfo entities.

Structured data is data with a high degree of structured organization. With more structure, it becomes easier to reuse data in other Wikimedia projects and by third parties; it also allows computers to process and 'understand' it.

Structured Data on Commons is a project that provides the technical infrastructure to complement the wikitext, templates and categories on Commons with structured data. Read more about Structured Data on Wikimedia Commons in the FAQ.

Logo of the Wikibase software

Wikibase is the software that powers Wikidata, and that will also enable structured data on Wikimedia Commons. It consists of a set of extensions to the MediaWiki software.

http://wikiba.se/

Logo of the Wikidata project

Wikidata is a Wikimedia project. It's a free, collaborative, multilingual database. Wikidata collects structured data to provide support for Wikipedia, Wikimedia Commons, Wiktionary, the other wikis of the Wikimedia movement, and to anyone in the world.

Further information: d:Wikidata:Introduction
Commons-logo-en.svg

Wikimedia Commons (or in brief 'Commons') is a Wikimedia project, a sister project of Wikipedia, and a collection of more than 40 million free media files. All the freely licensed photos, audio and video files, PDFs and other media on Wikipedia are stored on Commons.

Wikimedia Commons grows rapidly, by approximately 5 million new files per year.

Thousands of volunteers upload files to Commons, and integrate these into Wikimedia projects, like Wikipedia, to illustrate the content there and to share that media with the public. Media files on Commons are typically

  1. personal photography and media uploaded by individuals;
  2. freely licensed media files from external websites like Flickr, YouTube, open access journals, and other repositories;
  3. donations from institutions and organizations with substantial media collections, large and small, ranging from UNESCO, NASA and the British Library to small local cultural institutions.

Related Glossaries[edit]