Commons:Batch uploading/Wellcome Images CC-BY

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Wellcome Images CC-BY[edit]

Example archive quality scan of a chromolithograph from the Wellcome Images collections (7,087×5,141 pixels)
  • Source to upload from:
I shall email the Images Team to see if an API is available. The standard web search does not seem to filter by licence.
  • Describe the works to be uploaded in detail (audio files, images by …):
Historic medical related photographs and illustrations.
  • Which license tag(s) should be applied?
CC-BY, possibly PD on a case by case or age basis.
Fae, I've noticed that your ~1300 image test run uses the CC-BY-SA-3.0 license, inconsistent with the CC-BY-2.0 claim that appears on the source pages for these lithographs (or the CC-BY-2.0-UK license mentioned in the announcement). What prompted you to use CC-BY-SA-3.0? —RP88 00:22, 19 February 2014 (UTC)
Oversight rather than design. I'm swapping these to CC-BY-2.0 and if a different interpretation comes out of our discussions later with the Wellcome, I'll apply that decision.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
We probably should create a credit template in negotiation with Wellcome.
Category:Wellcome Images holds current related uploads.
(talk) 11:46, 21 January 2014 (UTC)
{{Wellcome Images}} is obviously a related template. The Haz talk 05:24, 17 February 2014 (UTC)
I'm thinking that we should use {{Artwork}} instead of {{Information}} as this seems most appropriate. I've created Institution:Wellcome Collection. Considering the template contains probably every field we could desire, it might be the best template to use. The Haz talk 16:36, 18 February 2014 (UTC)
The test run of 1,300 lithographs use the artwork template. These will probably be over-written with better information when they go from low res to high res images by using information from the full library catalogue. There may be some of the 100,000 images that are not artworks, a bridge to be crossed when we come to it. -- (talk) 20:38, 18 February 2014 (UTC)


CC vs. PD[edit]

It's great to have these images available, digitally, and I support the proposal to upload them by bot, but Wellcome are claiming copyright over, and to be the original source of, artworks and images from books which are already in the public domain. The assertion of copyright, and the right to attribution, should be rejected. They have added a strapline underneath each image; this will need to be removed. The process of downloading high resolution versions of these public-domain works is tortuous, with a CAPTCHA, irrelevant terms & condition, and zipped files. Andy Mabbett (talk) 13:27, 21 January 2014 (UTC)

For the relevant images, the T&Cs are very simple,[1] they just say that CC-BY applies and we must use "Wellcome Library, London" as an attribution, which the normal sort of credit we would give anyway. If I had to (if it turns out we can get no API access) then I can automatically trim the bottom strapline before upload, I already have a handy bit of Python that can do it and the strapline is not a requirement in the T&Cs. Note, the full high resolution version does not have a strapline.
(After a bit more testing) The download links are confusing, The first download link ("Download low-res images") guides you to download the "web quality" version on display, the second ("Download hi-res images") leads you through a CAPTCHA process to give you a download link for a zip file. I would have difficulty automatic the CAPTCHA process. The zipped full quality download is brilliant archive quality, my test example being >7,000px across showing beautiful detail of every figure in the painting, we definitely must have them.
I will see if my email gets suitable results before testing much more, or considering how the workflow for batch upload could work.
Note If you examine my first example manual upload (thumbnail above) this is a good example of where {{PD-Old-70}} or an equivalent may not apply and the best licence we could justify may well be the CC-BY one. The painting is catalogued as being created in the 1920's even though of an event in 1911, further I can find no date of death for the particular painter and this may be made complex due to the copyright law in China that may apply. It is worth observing that the EXIF data includes their old conditions, so has the licence as "cc-by-nc"; this is not in agreement with the stated website terms. -- (talk) 15:04, 21 January 2014 (UTC)
Another bit of license confusion is that their announcement identifies the license for these image as CC BY 2.0 UK while the tems identify the license as the unported version of CC BY 2.0. With regards to the PD images, where appropriate, I think something like {{Licensed-PD-Art|PD-old-100|Cc-by-2.0-uk|attribution=Wellcome Library, London}} is a suitable compromise and see some uploads are already taking that approach. —RP88 18:01, 21 January 2014 (UTC)
The Ts&Cs don't just impose an attribution on us, but on all re-users. We shouldn't be echoing that. And surely, if the Chinese image is not PD, then WT have no right to apply CC-by, or assert copyright in any other manner? Andy Mabbett (talk) 23:05, 21 January 2014 (UTC)
Andy, you appear to be getting views on this in many channels right now. I would rather wait until I have an email back from the Wellcome to my first question, which may enable batch upload quite nicely, and this might then also give me a suitable single point of contact to discuss how best to interpret copyright licensing. As it happens I raised the release of these images around 2 years ago with the Wellcome head of publishing, I am relieved that we have got as far as allowing public reuse of the images even if individual assessment of copyright on the 100,000 historic images for which truly are PD and which may have concerns, has yet to be completed. One of the benefits of a release on Commons is that our community is interested in copyright and will tend to winkle out these issues, even for complex and changeable areas of international IP law. -- (talk) 09:53, 22 January 2014 (UTC)
Went ahead and created {{PD-Art-Wellcome Trust}}. Feel free to amend. Jean-Fred (talk) 09:43, 22 January 2014 (UTC)
Thanks for setting this up. The assumption of PD may not be valid in some cases, until we start some test runs and have a better sense of how much of an issue this is, it is probably not worth engineering the solution much further at this moment. -- (talk) 09:56, 22 January 2014 (UTC)
Sure. What’s nice (and dangerous) with the template is that we can easily tweak the licensing information later based on a finer understanding of their terms (like cc-by Vs. cc-by-uk). :-) Jean-Fred (talk) 10:17, 22 January 2014 (UTC)
AIDS education poster (1990, Germany)

After handling a couple of these images, I believe that the batch upload project will need the support of a named contact within the Wellcome Library, or regular access for one or more Commons volunteers to be able to research the background of some of the collection. A good example of a file likely to be contested is included on the right as a thumbnail. It may well be that this was donated to the Wellcome Library as part of an set of archives but there is potential for this to be questioned due to the copyright mark naming Wojnarowicz as a member of Act Up (unfortunately David Wojnarowicz died in 1992, it is a photograph of Wojnarowicz as a boy that is featured in the poster). If the Wellcome Library does have a relevant letter of release or similar from an Act Up representative or agent, then this would be the basis of an OTRS ticket, or a public clarification in the description on Commons. Act Up would have produced this poster as part of their public knowledge mission and I have no doubt that representatives of the organization would confirm this as public domain if approached. Should this need to be done, then volunteers in Wikimedia LGBT can assist.

I am not sure at the moment how many of the 100,000 might be questioned, this would be a nice bit of analysis to do early on in the project so that suitable workflows deal with questions and there is confidence in how the collection is assessed before uploading to Commons. Due to the volume of work, this may even be an area that we may want to propose funding for to ensure it is done consistently and in a timely way. -- (talk) 13:50, 23 January 2014 (UTC)

With regard to ACT UP posters, I have sent off this email request for confirmation. -- (talk) 09:11, 24 February 2014 (UTC)

Excellent Fae. It would be great if we can get permission to host these ACT UP posters under a CC-BY license. However, to be honest, I confess that I'd like to know whether or not ACT UP had already given these posters to Wellcome under terms that permit Wellcome to distribute them with a CC-BY license. I really hope so, since If they haven't done so, even if they are willing to do so now, this would kind of cast a pall over Wellcome's generous release of their collection (as that might indicate we'll have to give closer scrutiny to the validity of the CC-BY claim on individual images, if Wellcome hasn't been careful when applying this license). —RP88 10:28, 24 February 2014 (UTC)
NC in the EXIF[edit]

More worryingly is in the EXIF data for the image to the right "Copyrighted work available under Creative Commons by-nc 2.0 UK" --AdmrBoltz 13:15, 27 January 2014 (UTC)

As noted previously, this appears out of date compared to the terms on the site. Using NC was their *old* policy. During a batch upload we could change the EXIF, however it is better to keep the digital file identical to the original. -- (talk) 13:22, 27 January 2014 (UTC)
Must have missed that above. While I normally would agree that keeping original EXIF data is good, this could lead to confusion if someone were to reuse the content out of Wikimedia. --AdmrBoltz 14:07, 27 January 2014 (UTC)
I have the skills to get Faebot to tweak the EXIFs with any agreed corrections, though I would suggest this only happens after the originals are uploaded so they appear in the file version history. I would look at this as part of the main upload project, once that gets under-way. -- (talk) 14:16, 27 January 2014 (UTC)
Technical stuff[edit]

Just noting I've enlarged and transferred their logo to Commons - File:Wellcome Trust logo.svg Nick (talk) 16:09, 21 January 2014 (UTC)

Metadata and conventions[edit]

Metadata structures from Wellcome Images (WI)
name data structure conventions and notes
photo number [A-Z]\d{7} This number may be found in Wellcome catalogues as "photo no" or "image number" and may be shorter by having dropped leading zeroes. This number appears unique to the Wellcome Images collection but other identification numbers may be usefully included as references from other catalogues, such as the Wellcome Library reference number.
source "" + <photo number> + ".html" An alternative of<photo number> will redirect to the same gallery page.

(Draft!) Mapping these to Commons parameters:

  • filename = <safe version of WI short catalogue description TBD > + "Wellcome " <WI photo number> + ".jpg"
  • source = <WI source>


Assigned to, task Progress Bot name Category
Fæ, to email Wellcome for information on the API or licence filtering. Status:    sent, pending reply
Email sent
Fæ, run single exemplar manually Status:    Done
Exemplar 1 - battle at the Ta-ping gate, 1911
Exemplar 2 - revolutionary women's army attacks Nanking, 1911
See image
Fæ, run sample batch upload Status:    Done aiming for before 26th Feb to coincide with the Wikipedia:WikiProject Medicine/Wellcome Library editathon 2014
A set of 1,300 lithographs is being uploaded in low resolution as a temporary measure to support the editathon using a customized upload rather than the GWToolset. This test batch can be upgraded to high resolution at a later stage.
Faebot Files from Wellcome Images
Fæ, review potential as a candidate for using the GLAMwiki toolset Status:    Done based on Wellcome/WMUK meeting held on 3rd Feb 2014
Fæ, agree access for Faebot avoiding the manual CAPTCHA (Wellcome Images uses Google's reCAPTCHA service as an anti-bot device.[2] Should this be rejected by Wellcome, a mass upload of hi-res images will remain impossible. Status:    pending Faebot