Commons:Batch uploading/Yale

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search


As discussed at Village Pump and announced here Yale released 250k images in its database under {{Cc-by-3.0}} license, see here for details.

We should start looking into moving them here while retaining all available metadata. --Jarekt (talk) 14:43, 3 June 2011 (UTC)


My prelimary evaluation:

  • 47343 images of paintings are available in high resolution at the present time. (Go here, fill in no fields, and click "Find.")
  • Images are made available as TIFF files, max resolution appears to be 2400 x 3000 px, 8-bit color, often smaller (they're crops of a single photo, but not bad). We should upload original TIFFs as well as JPEG versions, and cross-link them.
  • Image downloads via the website are protected by a re-CAPTCHA system. This needs to be either defeated, circumvented, or we need special permission to bypass it.
  • Download speed appears to be throttled to about 80 KB/s. At this rate it will take roughly 93 days just to download them all. This is expected and should not be circumvented, since bandwidth hogging costs money and draws ire.
  • We will require is a special license tag for these, because the situation is not simple. Yale has released their digitizations under CC-BY, which will be important in nations where digitizations may be protected by copyright or by a publisher's right, or in case of a hypothetical reversal of Bridgeman v. Corel. On the other hand, PD-Art indicates that attributing the source is not a legal requirement in the United States or other nations where reproductions carry no copyright, and we should not make reusers think that it is required. We need a special tag that combines these, while referring to the original entry in Yale's collection.
  • I don't know if the URL suffix is a stable reference number. We should instead link to a search for the Accession Number, like this.
  • Extracting metadata from HTML should be straightforward. Their metadata fields match our {{Artwork}} template rather well.

I can write a tool to get started on this, but have other obligations this week. Other opinions are welcome. Dcoetzee (talk) 07:29, 5 June 2011 (UTC)

We already have good contacts at Yale. Meg Bellinger from Yale gave keynote speech at GLAMcamp_NYC (see notes and slides). We can ask en:User:Witty lama, who I think interacted with them, to check what would be be the way to get the data with the least interruptions. We can also check if and how would they prefer that we link to their system. I can start on the license templates, institution templates, etc. --Jarekt (talk) 20:35, 5 June 2011 (UTC)


I created {{PD-Art-Yale}} for 2D artworks. Please verify & correct/improve. I think we should add attribution text parameter and possibly put parts of it in an info box with Yale Logo so the credit is not lost in the text.

It is uncertain to me if CC license extends to "digitization" of 3D objects which are otherwise in PD. --Jarekt (talk) 14:05, 9 June 2011 (UTC)

Looks good so far. I don't know if this collection includes three-dimensional works, or paintings with three-dimensional frames, but if it does it's worth noting that they must be used under the terms of the CC license in all nations (as the photograph would not be a mere copy). Dcoetzee (talk) 23:27, 9 June 2011 (UTC)
Yes if they CC extends to photography of the 3D objects than we would need a separate license: Artwork - PD-old, Photography - CC

--Jarekt (talk) 02:09, 10 June 2011 (UTC)

All listed links are currently dead, and [1] indicates that any material that is available is subject to copyright. Should this be delisted? BMacZero (talk) 02:19, 9 November 2015 (UTC)

Assigned to Progress Bot name Category