Commons:Collaboration on content distribution
This page is intended as help for GLAM institutions wanting to make their material available, regardless of whether on Wikimedia projects such as Wikimedia Commons or on other platforms. The page is primarily intended to help you with batch uploading to Wikimedia Commons and contain a walk-through of what needs to be done in advance, during and afterwards - and who should do what. Please feel free to improve the information on the page where you feel it's needed. If you have any questions you are welcome to ask them on the discussion page.
- 1 Background
- 2 Before
- 3 Who?
- 4 During
- 4.1 Identify existing resources
- 4.2 Analysis of metadata
- 4.3 Mapping the metadata
- 4.4 Prepare the files
- 4.5 Build infrastructure on Wikimedia Commons
- 4.6 Test upload
- 5 After
- 6 Related links
- 7 Notes
Many GLAM institutions want help with making material from their archives and collections available. This walk-though is a response to that need and describes what needs to be done and what you, by Wikimedia organizations, such as Wikimedia Sverige, and the Wikimedia volunteer community can do. This page presupposes that the reader is already familiar with why the material should be made available online under non-restrictive licenses.
The bullet points below are the criteria that need to be fulfilled with a clear YES in order for the work to able to proceed.
Material intended for distribution on Wikimedia projects must have a potential to fulfill a knowledge-sharing purpose:
- can the material be used to illustrate, explain or in other ways be informative? More or less all curated material in museum collections is relevant. For more details, see Commons:Project scope.
Material intended to be distributed on the Wikimedia projects must have a free license. That means:
- The work is free to share and use without restrictions.
- It is permitted to make changes and improvements, and to distribute derivatives of the work.
- It is permitted to use for commercial purposes without restrictions.
To ensure this, the material needs to fulfill at least one of the following criteria:
- It is old enough to have been transferred to the Public Domain.
- It is created by you and explicitly stated to be using a free license.
- Acquired or received under circumstances that guarantees that the material is free or where you own the right to release it under a free license.
We recommend the Creative Commons — Attribution 4.0 license (in short CC BY 4.0). This is a very widely used license for images by museums and archives all over the world. It is one of the licenses that is free enough to be accepted on Wikipedia, Wikimedia Commons and our other platforms. Using Creative Commons licenses is recommended praxis by the European Commission.
The license recommended for the metadata for the image is Creative Commons — CC0 1.0 Universal. The CC0 license is recommended by a large number of institutions. You can read more about why CC0 is suitable for metadata here.
Material that is intended to be distributed on the Wikimedia projects must be accompanied by enough metadata for it to be able to be described and used.
- Metadata about objects should be structured and, preferably, linkable.
- The copyright for metadata must be considered, it also needs to be under a free license.
- Material (images, text files, sound clips or movies) that have embedded data (e.g. EXIF-data) cannot have contradictory license information between the metadata and the embedded data.
Quantity and quality
All material shared and made available is interesting, but larger, more rare and high-quality collections are more interesting than others. There are really no limitations on how much material, quantity and size, there is room for on the Wikimedia platforms. The important things are that:
- the collection is big enough and coherent enough for it to be simple to automate the uploading, instead of doing it manually.
- the material you make available is of as good quality (resolution, size, format) as possible.
After you've answered YES to all of the above, it is left to answer who is going to do the uploading. The short answer is you, but not alone.
- You know your material the best. You know what metadata is available and, more importantly, what it means.
- By being directly involved you will learn the tools and routines on the Wikimedia projects which in turn enables continuous or future uploading with greater ease.
Who can help?
We are glad to help institutions, organisations, corporations or other entities out with making their material available on the Wikimedia platforms. However, due to limitations in time and resources we might not always be able to help you out with every step in the process. It is more likely that we will be able to assist you if:
- the institution is primarily in need of training, but intends to do the actual uploading themselves afterwards.
- the material is of high quality or unique, i.e. not already available in a comparable or better version elsewhere online.
- the cooperation leads the way for other institutions in the area to take the step to make material available on Wikimedia platforms.
- the cooperation includes making material from a new field available, e.g. from a new kind of institution or material of a different kind.
In other cases we can most likely help you find contact persons in the Wikimedia movement.
There are plenty of knowledgable and driven volunteers within the Wikimedia movement. In order for a project to be successful it is needed for you to be an active part. Often things gets much simpler if you find a contact person with a particular interest in your material. The volunteers are helping out as time permits, but often have other projects going on at the same time. If you are in need of a more focused effort you might consider hiring someone with a background in the Wikimedia community.
When the above is finished, it is time to move on to related activities. Note that this part differs when it is a matter of publishing facts (see our information page on Wikidata) and original texts such as OCR:ed books on Wikisource, where additional steps might be necessary.
Identify existing resources
Is the material already available online?
If the material (or the metadata) is already made available online it is of interest that the Wikimedia projects can link back to the original. In order to facilitate this linking it is important to understand how these web addresses relates to persistent IDs included in your metadata. Persistent IDs are normally the ID you use to identify the item in your own collections and should be the same when used in different databases.
If there is a structure in the metadata, connecting it to different authority records, or if there is an already existent mapping to the Wikimedia projects it is important to make sure they are included in the metadata to be exported.
Analysis of metadata
Before the metadata can be mapped to the Wikimedia projects it is important for it to be fully analyzed. For this reason it is a necessity that the metadata is in a machine-readable format.
An analysis makes it possible to identify what parts of the metadata that can be mapped automatically and what parts that are prioritized for manual mapping.
Mapping the metadata
The mapping should always be a cooperative effort between the data owner and the Wikimedia community, since respective party has the best understanding of their structure and data. Together they can learn from each other which makes it possible for further uploading of data to be made by one of the parties by themselves.
- Steps involved in the mapping process
- Analyze the data in order to decide what categories, templates etcetera that are worth creating.
- Identify what template to use for the information structure, or alternatively, create a special template for the project.
- Identify whether the data can be formatted so that it is available to an international audience:
- Match keywords to (English) categories on Wikimedia Commons, or alternatively, create new categories.
- Map fields in metadata to fields in the chosen template
- Find existing templates for concepts and people
- Create a template which generates a back link to the original out of the persistent ID. This helps avoid future link rot.
Prepare the files
Wikimedia Commons only allow for material in open file formats. As a result of this the material might have to be converted to one of the allowed file formats first. File types currently accepted are:
- Images: SVG (for vector graphics), PNG (lossfree format), JPEG (for photographs), GIF (but PNG is preferred), TIFF (lossfree format, but PNG is preferred), XCF (allows for layers and text).
- Audio: Ogg (using FLAC, Speex, Opus or Vorbis codecs), WebM (using Vorbis codec) or MIDI.
- Recommended format is Oga Vorbis.
- Note that Ogg also can use the file extension .oga, where a stands for audio only.
- Video: WebM (VP8 codec for video and Vorbis for audio), Ogg Theora (Theora codec for video and Vorbis for sound).
- Note that Ogg also can use the file extension .ogb, where v stands for video only.
- Animations: GIF, APNG and animated SVG.
- Scanned documents: DjVu and PDF. There is a guide that help you choose between those formats.
Embedded data (e.g. EXIF)
It might be worth embedding certain metada (e.g. license/copyright information) directly into the file as EXIF/IPTC or XML (in XMP format). Doing so makes that information retrievable even if the file gets separated from the information shown in relation to it on the Wikimedia platforms.
- We recommend that each file as a minimum contains at least information acknowledging the creator, the title and copyright protection.
- Creative Commons have licensing tools that helps with downloading licenses prefilled with data as xmp, that can then be integrated into the image files using e.g. ExifTool.
Build infrastructure on Wikimedia Commons
In order for the published material to be knit together and be given a uniform appearance on Wikimedia Commons certain forms of infrastructure is needed to be built. Commonly occurring parts are:
- A project page with information about the material and it's background.
- A place for online discussions about the material, preferably in direct relation to the project page.
- A template that clearly states on each page for an individual file that it belongs to the currently uploaded batch.
- A category gathering all material made available through the project.
- A page for reporting errors in the material or in the metadata.
- A template for the publishing institution with name, location, web page and other relevant information.
When the infrastructure is in place a test upload of a limited amount of material is done. This gives other active users of the Wikimedia projects an opportunity to give feedback and suggest improvements to the integration of the metadata. This is an iterative process and will continue until the integration of metadata has been accepted by the community. Thereafter the rest of the material is uploaded.
The publishing work does not end when the actual upload is done. Partly, the increased visibility leads to an increased likelihood that errors in the material are discovered and partly there might come improvements and additions to the material that is worth taking advantage of.
Factual errors and additions
Many errors are discovered and corrected directly within the Wikimedia projects. Others can involve a discussion among experts in order to identify the source of the error. In the first case it can be of interest to watch the material in order for corrections to become of use in your own metadata. In the second case a dialogue with the person noticing the error as well as introducing any corrections on the Wikimedia project platform.
Any errors that are discovered in the mapping should be addressed as soon as possible in the uploaded material and brought to attention before any further uploads are done.
Additional information and translations are often introduced on the descriptive pages of the material. Those can be of interest to bring back into the own metadata.
The publishing of material triggers curiosity, interest and further use and sharing. It is valuable if the publishing institution is present and active in discussions and thereby be able to take advantage of ny information and answer questions.
There has been several tools built by the Wikimedia community to get statistics about page views and contributions. See for yourself how you material is being used!
- Commons:Varför Commons behöver media i hög upplösning (engelska)
- Commons:Guide to batch uploading
- Commons:Guide to content partnerships