Commons:Structured data/Get involved/Feedback requests/GLAM metadata and ontologies mapping
Uploading files from GLAM collection databases to Wikimedia Commons can be quite challenging. Some of the main challenges have been investigated during a research project started in July 2017. One of the findings: translating metadata from a GLAM collection database to the data structure on Wikimedia Commons ('data munging') is considered one of the blockers or hurdles in GLAM contributions to Commons. 
This process can probably be made more straightforward by:
- Making sure that the main metadata standards/schemes and ontologies used in the GLAM sector are adequately mapped to Wikidata, and/or to the (still to be developed) structured metadata devised by the Commons community, as soon as structured data on Commons is deployed in 2018.
- These metadata mappings are made available in the updated APIs of Structured Commons, and perhaps also in dedicated code libraries, so that developers of upload and data synchronization tools can use them.
- These metadata mappings will be integrated into the most frequently used upload (and data synchronization) tools suitable for GLAM uploads.
Types of metadata schemas and ontologies for which mapping may be beneficial
- ontologies (example:CIDOC CRM, FRBR / example concept: a work) - a structured system that outlines how knowledge is organized in a specific domain
- metadata standards (example: MARC / example field: creator) - a list/set of 'fields' that a domain typically uses to describe its objects and media
- vocabularies (example: the Art and Architecture Thesaurus / example concept: brutalism) - a list of keywords that are commonly used in a domain, often as subjects or typologies of creative works
- authority files (example: Union List of Artist Names / example concept: Rembrandt van Rijn) - a list of names of people and/or organizations that are common to a certain domain
Some of these metadata schemas and vocabularies will most likely need to be mapped to Wikidata (e.g. related to concepts depicted in media files, to creators...), some to Wikimedia Commons (e.g. metadata fields related to media files specifically).
On Wikidata, quite a bit of work has already been undertaken to match (especially) vocabularies and some metadata schemes. See the work by
Quite a few broadly used GLAM vocabularies are in the process of being matched to Wikidata, with tools like Mix'n'match. A few examples:
- The Getty's Art and Architecture Thesaurus (AAT) in Mix'n'match - Wikidata property (Art & Architecture Thesaurus ID (P1014))- SPARQL query for Wikidata items that are linked to the AAT
- The Getty's Union List of Artist Names (ULAN) in Mix'n'match - Wikidata property - SPARQL query for Wikidata items that are linked to ULAN
Request for feedback
This is a proposal to check whether members of the Wikidata and Wikimedia Commons communities, and GLAM staff, think this is a correct and valid issue to work on, and to inventorize who is interested to actively work on this topic. We warmly welcome feedback on this proposal. Some questions to guide your feedback:
- Do you think this is a worthy undertaking?
- If no: why not?
- If yes, do you volunteer to work on this? Which are your main (sub-)interests?
- Any suggestions on how to best organize this work?
- In which way can the GLAM team of Structured Commons (most specifically Sandra) support this endeavour? What do you need from the team?
- Is the proposal itself accurately written and does it address the right issues? Which changes and updates would you propose?
This feedback request runs until Friday 4 May. Your feedback is very helpful in letting us all decide together on next steps.
- m:Research:Supporting Commons contribution by GLAM institutions/Preserving important metadata about media items
- m:Research:Supporting Commons contribution by GLAM institutions/Preparing media items for upload
Please add your feedback to the talk page!