Commons:Digital Public Library of America
In 2020, the Digital Public Library of America (DPLA) is embarking on a project to provide digital assets from DPLA's contributors to Wikimedia Commons. This project is funded by the Sloan Foundation and supported by the Wikimedia Foundation. See announcement. In 2023, The Sloan Foundation awarded DPLA a grant to continue and expand this work.
The Digital Public Library of America is the national aggregator for digital heritage collections in the United States. It currently encompasses over 36 million records, collected from large partner institutions such as the National Archives and Records Administration, the Smithsonian Institution, ARTstor, and HathiTrust, as well as thousands of smaller contributing institutions whose data is collected by DPLA's network of regional service hubs.
DPLA's Wikimedia project aims to build a single pipeline for the many diverse DPLA data providers to contribute digital assets to Wikimedia Commons. Compatible materials are only those which use a copyright license or rights statement that marks them as public domain or otherwise acceptable for use on Wikimedia Commons, and which have media accessible in a machine-readable way (such as IIIF). A special point of emphasis for the project is DPLA's Black Women and the Suffrage Movement collection. The project is being led by Dominic Byrd-McDevitt, a DPLA Data Fellow.
During the first year of the project, DPLA worked with seven partners, representing more than 200 institutions, to add 1.4 million files including documents, maps, and photos to Wikimedia Commons.
In February 2020, DPLA bot was approved to begin upload of media files from DPLA contributors. This is a Pywikibot-based bot utilizing the DPLA API. You can find all of the uploaded files at Category:Media contributed by the Digital Public Library of America. As of 21:38 March 3, 2024, there are 3,798,767 files that have been uploaded to this tracking category.
In 2022, DPLA put together a Wikimedia Working Group to support and further DPLA's Wikimedia Project, improving the capacity and sustainability of the project through initiatives to support project participants, creating and driving new cross-network collaborations, and improving documentation. Read the announcement and view the inaugural members.
Are you affiliated with a U.S. institution interested in seeing a copy of your public domain and/or openly-licensed assets deposited in Wikimedia Commons? Here are some things to consider:
- Is your institution already contributing records to DPLA?
- Do you have institutional support/buy-in for participation?
- Do those records include standardized rights statements in the form of URIs from RightsStatements.org or Creative Commons (used to identify those items in the public domain or with open licensing)?
- Do your digital collection system allow ready access to full-size media, with or without IIIF metadata?
If you aren't sure about these questions, or would like to discuss further, please reach out to your service hub coordinator(s). Alternatively, contact DPLA directly at dominicdp.la, or post on the talk page.
Why should my institution participate?
This project is best suited to those institutions interested in widespread sharing and re-use of their public domain and openly licensed digital assets. Benefits of participation include:
- Digital objects shared on Wikimedia Commons are made available for re-use to the general public, including a large and active community of Wikipedia/Wikimedia editors. Perhaps the most visible form of this re-use is the inclusion of contributed image files in Wikipedia articles. Wikipedia is one of the most visited sites on the web.
- Participating institutions are well-positioned to develop organizational initiatives and public programming related to their Wikimedia Commons content (e.g., Wikipedia edit-a-thons).
- Materials contributed to the Commons are brought together with topically similar materials from a wide variety of sources outside of the galleries, libraries, archives, and museums world, thereby opening a greater depth of source material to researchers and the public.
- Materials contributed to the Commons are enriched with structured data in the form of Wikidata—a free collaborative knowledge base used across the web. Wikidata is used by search engines to better understand content on the web and provide meaningful results to users. Additionally, Wikidata values are multilingual, greatly extending the reach of cultural heritage materials originally described in a single language.
What contributing looks like
When institutions participate in DPLA's Wikimedia Commons pipeline project, they send a full copy (i.e.,rather than just a thumbnail image) of their public domain or openly licensed digital objects, as well as their associated metadata (records), to the Commons. In most cases, a "full copy" refers to the fullest "access," "service," or "web" version of a digital object available through the institution's home repository/digital collection system. For example, this might refer to the largest JPEG version of a digitized photography available to the public through that institution's digital collections. These files are harvested (copied) in an automated manner by DPLA, and associated metadata is synchronized from one harvest to the next, accounting for any changes to a record made in the interim. A file that has been harvested via the pipeline always includes a full set of metadata including: 1.) Name of the owning institution; 2) A copyright statement or license; 3) A link back to that digital object and record in its home system. In this manner, provenance and ownership (i.e., of the original document) are clearly documented and maintained in the Commons. Descriptive metadata contributed to the DPLA portal is also included in the version hosted on Wikimedia Commons.
Here is an example of a file and its associated metadata: "Babe", Walbridge Park elephant, Toledo, Ohio.
For a full explanation of DPLA's data ingestion procedures as they relate to Wikimedia content, please visit: https://github.com/dpla/ingestion3#wikimedia. However, most of these ingestion procedures are handled by DPLA as part of its onboarding and ingest process.
To get started, hubs will need to do the following:
1. Schedule an initial conversation with DPLA staff to discuss considerations for your hub and institutional providers.
2. Provide DPLA staff with a list of the contributing institutions (aka providers) from your hub interested in sending content to the Commons.
3. Ensure that records slated for contribution include one of the following rights values:
4. Ensure that records slated for contribution include a "iiif" manifest value (preferred) OR a "mediaMaster" value, providing access to a full-sized version of the image file. There is no one prescribed method for including these values. Service hubs provide their data to DPLA using a variety of standards (Dublin Core, MODS, etc.) and formats (XML, JSON, etc.), which will inform how these values are included in their regular DPLA ingests. In some cases, no alterations to metadata are required; DPLA may be able to use existing hub metadata to identify or derive full digital object location. Hubs are advised to talk to DPLA staff before making significant changes to their data structure.
- Here are examples in a MODS record:
- Here is an example from a record using Dublin Core and Europeana Data Model (EDM) elements:
In the initial phase of the project, we need the most help matching entities in DPLA's data to Wikidata IDs, so we can do things like using Institution templates and adding structured data. Please use the lists of entities below to help with this task:
If you are interested in helping this project, please feel free to sign up below.