Commons:Structured data/Get involved/Feedback requests/Ontology
As the implementation phase of the Structured Data on Commons project approaches, the Wikimedia Foundation’s Multimedia team needs to define rulesets and guidelines for determining where Commons data should go in this new system.
SDoC introduces a three-part system of metadata storage:
- MediaWiki - the existing base system that relies upon semi-structured and unstructured data in wikitext
- Wikibase@Commons - a new instance of Wikibase (the same database-like software that powers Wikidata), used exclusively for Commons metadata
- Wikidata - the existing wikiproject for general data about the universe
Where a particular piece of metadata is stored depends on the nature of that metadata and which of the three platforms is best suited for it. This can be illustrated in the sample chart below:
Going forward, basic guidelines are needed to help the Multimedia team make decisions on “what” goes “where” between the three mediums of storage. The three parts are listed with the general points that the team is considering using, and the question that they will be asking themselves to make decisions when development begins.
After reading through these proposed guidelines, feedback is appreciated. What do you think about the points provided? What guidance can Commons provide when making these decisions? The discussion section is the place for answering these questions.
Sample guidelines for what gets stored in Wikibase@Commons
- Information that is specific to the media file
- Does this information only apply to the media file in question?
- Textual descriptions that should be multilingual
- Is this information text that describes the media file and ideally would be in multiple languages for everyone to read?
- “Pointers” to items in Wikidata (via Wikibase Federation)
- Does the media have properties that are generic and have corresponding Q items or P properties in Wikidata?
- Information relating to organization of the file in Commons
- For example, good/featured status, uploaded as part of a campaign, possibly expected usage (maintenance, navigation, etc.)
Sample guidelines for what gets stored in wikitext in MediaWiki
- Information used to tell bots and humans that something needs to be done to the file (templates)
- Is this information solely for maintenance and/or curation purposes? (marking files for deletion, requests for rotation, etc.)
- Is the use case best served by attaching a Wiki-style category?
- Information specific to other MediaWiki systems
- User names, etc.
Sample guidelines for items that refer to, or get stored in, Wikidata
Development plans are that information stored on Wikidata will be accessible through Commons for editing and history review. It cannot be guaranteed at all at this time, but that is the operating assumption for this conversation.
- Generic information that applies to the universe at large
- Properties (length, height, weight, material, etc)
- Locations (coordinates, cities, countries, etc)
- Classifications (species, etc.)
- Licenses (CC-BY, GPL, etc.)
- References to notable entities that have Wikidata entries
- Famous people (artists, politicians, etc.)
- Works of art (The Thinker, The Scream, etc.)
- Places (landmarks, buildings, museums, libraries, etc.)
- Sites (Flickr, Facebook, Unsplash, etc.)
- Organizations (companies, sports teams, etc.)
- Possibly: References to non-notable entities that have structural value for Wikibase on Commons
- Non-notable Commons users who have uploaded files
- Non-notable places where photographs were taken (shops, parks, venues, etc.)
- Non-notable photographers whose work has been uploaded
Here’s a short decision tree to sum up the above sample guidelines: