Commons:Digital Public Library of America/Modeling

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page will be used to document DPLA's data modeling for SDC statements. (Statements currently under discussion are marked in yellow, so that approvals can be changed to green and anything we decide against or want to put on hold can be changed to red).

Wikidata property Sample values Sample file Derivation Notes
DPLA ID (P760) 62377331b08262c5f79a1be52f7fc757 The ID of the item being uploaded from.
  • As above, the DPLA ID is an identifier for the whole item, not the specific image that has been uploaded. How can we add it in SDC at the asset-level without confusing the two?
described at URL (P973) https://ark.digitalcommonwealth.org/ark:/50959/z603tb264

https://dp.la/item/62377331b08262c5f79a1be52f7fc757

This is the DPLA URL (which can be generated from the ID itself) and the source catalog record URL, which is simply the exact string found in DPLA's "isShownAt" field.
  • The catalog records, of course, describe the whole item, and not necessarily the specific digital asset, which may only be one page of a larger work, to which the statement is applied. Is this acceptable, or can we clarify the level of description somehow (with qualifier)?
  • All DPLA items will necessarily have a DPLA record and a source record. Should these different URLs be qualified somehow to distinguish?
source of file (P7482) file available on the internet (Q74228490) Same as above, with URL in described at URL (P973) as qualifier instead. Instead of initial proposal above, after discussion, this has been implemented with source of file (P7482) and file available on the internet (Q74228490) as value, per convention, with described at URL (P973) as a qualifier of that with URL as value. Using only the contributing institution URL instead of DPLA, which is in the reference.
URL (P2699) https://ark.digitalcommonwealth.org/ark:/50959/z603tb28p/large_image The exact URL that was uploaded, may or may not come from the data (if it was found via IIIf manifest instead).
  • Is the basic "URL" property correct here for specifying the direct link to the image itself, or is there a more specific property?
IIIF manifest URL (P6108) https://ark.digitalcommonwealth.org/ark:/50959/z603tb264/manifest This is the exact value of the "iiifManifest" field in DPLA's data.
  • As with "described at", the IIIF manifest is for a whole item. We have this data, but is there a preferred way to add this at the asset-level?
copyright status (P6216) copyrighted (Q50423863)

public domain (Q19652)

If the file has a rights URI of Public Domain Mark or No Copyright-United States, apply public domain, otherwise copyrighted.
  • Technically, these fields are at the item-level in the data. This could cause some issues if an institution wants to treat the copyright of the digital asset and the item differently—but currently there is not really a way for institutions to do this in DPLA anyway. This could cause issues in edge cases, such as a museum licensing their photography of (public domain) 3D works with a CC license. How would you differentiate the depicted artwork and the digital asset in SDC?
  • Question: How is CC0 handled? Copyrighted or not, and is CC0 a "license"?
copyright license (P275) Creative Commons Attribution-ShareAlike 4.0 International (Q18199165) Each of the valid URIs for DPLA's rights field (aside from PD ones) will be mapped to the correct Wikidata item/property for each RightsStatement.org statement item or Creative Commons license item.
RightsStatements.org statement according to source website (P6426) No Copyright - United States (Q47530911)
title (P1476) "A Rill from the Town Pump" essay by Sarah (Sallie) M. Field, Abbot Academy, class of 1904 Exact string of DPLA's "title" field.
  • The vast majority of items in DPLA are in English, but language is not specifically spelled out as a field in the data. Is it better to use language code "en" and accept a small error rate, or to apply "und" to all?
  • Should this specify in some way that it is the title of the depicted work, and not the digital asset?
Commons media contributed by (P9126) Digital Public Library of America (Q2944483)

Ohio Digital Network (Q83878495)

Toledo-Lucas County Public Library (Q7814140)

This statement would presumably be hardcoded with "Digital Public Library of America" for every upload DPLA performs. I think it should also have the DPLA hub (intermediary aggregator) and original contributing institution.
  • If adding all organizations in the chain, do we use a qualifier to distinguish the source institution from the aggregators (and do we describe DPLA and its hubs differently)?
  • One other issue: Not all "institutions" in DPLA's data are really the same type, since the hubs decide how to use this field. In this field you could find a whole library system, a single library branch, or even a single department within a larger organization (example: National Archives at College Park - Still Pictures (Q59661040)).
collection (P195) Toledo-Lucas County Public Library (Q7814140) This would probably be where we put only the contributing institution.
  • As above, how to handle institutions that list departments or organizational sub-units? I am guessing we may have to flag certain institutions to handle them differently, especially the US National Archives, which lists the National Archives as the hub and then all of its departments as if they were independent institutions.
page(s) (P304) 1 This is not from DPLA's data, but the page increment, the same that is computed and included in the file name, based on the number of media files in an item.
  • I am not sure this is the right property, as it is mostly used in Wikidata references, but it seems like an important concept because many of our uploads are only single pages or larger works.
  • Often, the page number of the sequence of files uploaded and the original page number of the scanned page are not the same. e.g., the first upload in a sequence for an book could be the cover, while the actual page "1" might be the 5th file after title page, acknowledgements, copyright page, etc.
  • Can we also represent in SDC the number of pages in the work, in addition to the page number of the current file?
author name string (P2093) Department of State. Agency for International Development. 1961-10/1/1979 This is primarily coming from DPLA's creatorfield, but could also incorporate values from other less common fields (e.g. publisher).
  • DPLA does not have a controlled vocabulary around creator entities, so we can only use text strings here. It will be a difficult task to decide how to match to Wikidata items, but we can use the author name string property to start with.
  • "creator" is used very broadly and variably across DPLA's institutions, in ways that may not match expectations of this property's scope. One particular issue with the National Archives is that "creator" is typically the agency that preserved the record, but not the person that created it (which is sometimes the employee of the agency, but also sometimes a citizen who submitted documents to the government, gave testimony in court/Congress, correspondence/clippings saved by an agency, etc.)
creator (P170) somevalue Same as above. Instead of initial proposal, after discussion, this has been implemented as creator (P170) with somevalue value, and author name string (P2093) is used as a qualifier of that.
original catalog description (P10358) A postcard from the Ken Levin Toledo Postcard Collection, donated by Toledo resident, Ken Levin. The collection contains picture postcards about the Toledo area. Mr. Levin’s collection was published by the Toledo Blade in a book entitled “You Will Do Better in Toledo: From Frogtown to Glass City”, edited by Sandy and John R. Husman. From DPLA description field.
DPLA subject term (P4272) Postcards From DPLA subjects field. This property is used to display DPLA subjects. Because they are uncontrolled and can't be reliably matched to other entities, this formats the subject as a search query URL in the DPLA catalog.
access restriction status (P7228) unrestricted access (Q66739888) This comes from the stringValue blob in the original record, which is also accessible via DPLA API. This is only being added to files uploaded for NARA currently. The values here are a set of terms, such as "unrestricted", which are matched with their Wikidata items.
level of description (P6224) item (Q11723795) This comes from the stringValue blob in the original record, which is also accessible via DPLA API. This is only being added to files uploaded for NARA currently. The values here are a set of terms, such as "item", which are matched with their Wikidata items.
U.S. National Archives Identifier (P1225) 146926628 This comes from the stringValue blob in the original record, which is also accessible via DPLA API. This is only being added to files uploaded for NARA. Since there is a dedicated property for NARA IDs, this will be added as well as the DPLA IDs for NARA uploads. When this is added, the NAID is not added with P217, even though it is coming from the DPLA identifier field.
inception (P571) 21 September 1916 DPLA has a date object field with uncontrolled text, but will be able to add the values from this field if they match certain common formats like YYYY-MM-DD
inventory number (P217) sn86090249 This comes from the DPLA identifiers array, which has uncontrolled identifier strings from the provider. In addition to the DPLA ID, we add inventory number (P217), with collection (P195) as qualifier, using all the values that represent the local identifier used by the source institution. This is ignored in the special case of NARA, as noted above, which has its own property.