Commons:Digital Public Library of America/2022 SDAW project
This project concept was developed in collaboration with the SDAW and GLAM teams at the Wikimedia Foundation. DPLA is being contracted to do the following work under WMF's SDAW, starting in June 2022. This page is currently lacking some interwiki linking and context, as it is being moved from a Google Doc. For more info on past DPLA projects referenced here, and future updates on this work, see /Timeline.
In 2021-22, DPLA was funded by the Wikimedia Foundation to develop automatic updates of their contributing institutions’ media files to Wikimedia Commons, with structured data.
As part of this project, DPLA created over 27 million structured data statements for 2,334,599 items, including statements about copyright status, copyright license, RightsStatements.org statement, creators, subjects, identifiers, contributing institutions, description, title, and collection.
DPLA also helped to specify a new ‘references’ feature for structured data on Commons and redesigned the file info box for DPLA items in Wikimedia Commons to draw from Structured Data Statements rather than duplicative wikitext.
Finally, DPLA successfully prototyped image citations for Wikipedia that draw from structured data statements and shared the implementation with the community. This project included coordination with volunteers that helped implement structured data querying from Commons into Commons’ interface, so the prototype could be shown in a sandbox in the Commons interface.
Developing the collaboration
This record-breaking contribution of media files to Commons will have the most impact when the images are put in new contexts on Wikipedia and accessed by a large global audience. The next phase of DPLA’s engagement with structured data on Commons will drive the reuse of described and attributed images on Wikipedia.
Also, moving the Citations pilot forward will help make the citations work accessible to editors in the Wikipedia interface and bring citations capability to actual articles, which will also help make a case for SDC querying in Wikipedia.
Project summary
To make images more discoverable in Commons media search, DPLA will undertake the following pilots with contributing institutions and the Wiki community:
- DPLA will develop a path for one or more contributing institutions or hubs to share more of their descriptive metadata with DPLA and Wikimedia that could be reconciled with Wikidata entities. This will allow for more descriptive structured data to allow for more items to appear in search results on Commons.
- To address visibility for records that have irreconcilable descriptive metadata, DPLA will develop a tool for suggesting depicts statements based on metadata subjects and evangelize that the tool be used by DPLA’s community to improve search on Commons.
- To leverage DPLA’s detailed modeling of sources in structured data records, DPLA will coordinate with the community to create a Wikipedia citation gadget. While this gadget will follow the data modeling used by DPLA for their contributing institutions, DPLA will document the gadget sufficiently to allow for customisation for use by other institutions should they desire to undertake similar work.
- DPLA will create process documentation and conduct outreach to other national-scale aggregation projects to share DPLA’s successes and learnings and to advocate for more contributions to Commons globally.
Outcomes
- At least 1 million DPLA-contributed images updated with subject, creator, or other reconciled entities.
- DPLA will launch a tool for adding “depicts” statements and evangelize it to our network.
- 35K images on Wikipedia with SD image citations
- Pipeline, subject-depicts tool, and citation gadget fully documented on wiki
- Citation template and/or gadget shared at 3 Wikimedia conferences or meetings, reaching >100 people
- DPLA will conduct outreach to other regional and national aggregators to share our success and learnings and to encourage them to engage in similar programs.
Risks + Assumptions
- DPLA has not deployed user-facing software to facilitate edits on Wikipedia yet. The gadget and subject-depicts tool would be new ground for us.
- A few of these outcomes rely on volunteer labor that DPLA cannot itself guarantee will be available or willing.
- Some editors may be philosophically opposed to an organization conducting these types of activities on Wikipedia platforms and may work to undo the edits we facilitate.
Project team
- Dominic Byrd-McDevitt
As one of the most long-standing leaders in the GLAM-Wiki movement, Dominic has nearly a decade of experience managing GLAM-Wiki programs in cultural institutions. He served as Wikipedian in Residence for the Smithsonian Institution, and also for years at the US National Archives (NARA). Highlights of his NARA tenure include pioneering approaches for bulk upload to Wikimedia Commons, successfully executing one of the largest bulk uploads prior to DPLA; developing the GLAM-Wiki Boot Camp model of capacity-building workshops for Wikimedians with Wikimedia DC; assisting in organizing the 2015 WikiConference North America, which was held at the US National Archives; and contributing over 1 million edits to Wikidata in the course of adding NARA archival metadata to items.
- Michael Della Bitta
Michael has oversight over the technical aspects of the Wikimedia program, and also takes part in defining the strategy for the Wikimedia program. The DPLA Tech Team, managed by Michael, with its staff of software developers, provides the technical support necessary for the DPLA Wikimedia program. This includes maintenance of the digital asset pipeline to Wikimedia Commons, and any bug fixing or feature development it requires, as well as the technical needs related to metadata ingestion from partners, analytics and reporting, etc.
Funding ask
- $21,250: Software development
- $10,625: Data processing
- $10,625: Community coordination and outreach
- $7,500: Indirect
DPLA will continue operations of the upload and metadata synchronization pipelines using other funds.
Timeline
1 year: June, 2022 through May, 2023
- Q1 (June - August)
- Reconcile 1 million subjects in existing data
- DPLA begins seeking community developers for citation gadget
- Begin work on depicts tool
- Q2 (September - November)
- Bring additional URIs through ingest pipeline
- Launch initial version of depicts tool
- SDC citation template/gadget shared at WikiCon NA
- 1-2 specific DPLA partners trained in “depicts” tagging
- Q3 (December - February)
- SDC-based image caption/citations added to DPLA images in English Wikipedia
- Begin sharing SDC citation work in Wikimedia community meetings
- Outreach to DPLA Network around “depicts” tool
- Q4 (March - May)
- International outreach on digital asset pipeline and SDC issues
- Ongoing coordination of DPLA Network “depicts” efforts
- Additional outreach to DPLA network and Wikimedia community related to citation gadget, if development occurs within our timeline