Commons talk:British Library/Mechanical Curator collection

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Images by year[edit]

A chart showing the number of images by year for this set may be viewed at Andy Mabbett (talk) 12:50, 14 December 2013 (UTC)


With 65,000 books, this page may get quite large, and may take some time to build! But a good subject index to what's been scanned, if only at the whole-book level, would indeed be a very useful start, so props to the editors who've got this page going. Has anybody got anything in the works for automated keyword extraction from the titles list? (Keywords found could presumably then be run against Wikidata, to start to organise them).

Until that can be done, the search box at the top of the Flickr page appears to give quite a good start for finding images from books with a particular word in the title; though it seems the initial results may overwhelmingly come from the first book it finds, making it hard to identify further on-topic source books. Jheald (talk) 11:47, 16 December 2013 (UTC)

I'll ask my contact at the BL if they have a list of the books scanned. Andy Mabbett (talk) 13:48, 16 December 2013 (UTC)
There's a full manifest of the images at, which can be downloaded as a zip-file. As well as files listing all the images published in each year, there is also a file books_list.csv which gives the titles/authors/place of printing -- it includes a column for 'subject', but that appears to have very few entries populated.
Adding a tally to this table of how many images were extracted from each book (small/medium/plates/total) to identify the most image-dense sources would probably be useful; as would be some idea of genre (eg poetry/novel/non-fiction). Jheald (talk) 15:18, 16 December 2013 (UTC); edited 15:53.
A short perl script later:
  • 37,116 titles have one or more images (making 1022041 in all)
  • 17,438 have five or more images (making up 97.4% of the images)
  • 13,485 have ten or more images (94.8% of the images)
  • 9685 have 20 or more images (89.7%)
  • 7490 have 30 or more (84.5%)
  • 6080 have 40 or more (79.7%)
  • 4964 have 50 or more (74.9%)
  • 2286 have 100 or more (56.8%)
  • 811 have 200 or more (37.1%)
  • 204 have 500 or more (19.2%)
  • 52 have 1000 or more (9.3%)
  • 12 have 2000 or more (4.3%)
followed by Thrilling Life Stories for the Masses (1892) with 5206 images (though most of them appear to be advertisements).
Jheald (talk) 18:51, 16 December 2013 (UTC)
  • I looked into this a while ago when I was putting some of the collection directly on Wikisource. There is no subject index for the books - well, not a useful one. Old-fashioned cataloguing didn't assign subject headings very often, and these are essentially just the old catalogue entries digitised. I make it about 870 books with identified subjects, out of almost 50,000 titles. (The 65k figure strictly is volumes, not books). I'm afraid there's no way to get information about the subjects or genres of the other 49,000 books other than by deducing it from the title. Andrew Gray (talk) 23:08, 16 December 2013 (UTC)
On the other hand it's striking that 50% of the images are in the first 2000 titles (and 20% in the first 200), so that may be a list that is worth working. Jheald (talk) 00:40, 17 December 2013 (UTC)

Flickr links[edit]

To help track down uploads that haven't been added to the category:

— Preceding unsigned comment added by Andrew Gray (talk • contribs) 23:09, 16 December 2013 (UTC)

I have now made a first pass through these. As of the present, all images in the lists should be included in Category:Images from the British Library Mechanical Curator collection; however many do not have book categories. These have been tagged {{no bookcat}} and can be found in Category:Images with no book category from the British Library Mechanical Curator collection Jheald (talk) 21:45, 11 January 2014 (UTC)

Full wiki-list of books, now up[edit]

I've put up the full list of books on 7 pages, starting at Commons:British Library/Mechanical Curator collection/Full list of books. The format should allow easy copy-and-pasting of entries to the Synoptic Index.

The text at the top of the page is at Commons:British Library/Mechanical Curator collection/Full list of books/template, and the footer is at Commons:British Library/Mechanical Curator collection/Full list of books/footer template, so these can be tweaked without having to edit the (very large) actual pages. If the actual pages do ever need to be updated as a whole (rather than sections), there are entries in the pages' edit histories for them at the stage when they contained only the templates -- this may be a better start for any wholescale rebuild, rather that an attempt to edit the mega-page as a whole.

One thing I should have done was to add a check-mark for titles that have one entry (at least) in the index. So I'm going to do that; and then it may be worth adding a "Finding content" section to this project main-page. Hope people find the list useful. Jheald (talk) 01:46, 21 December 2013 (UTC)

Getting the word out[edit]

I've just made this post on en:WP:VP, crossposted to en-wiki's Signpost tips queue, and also put up something similar on the en:WikiProject London talk page, to try to get the word out.

It would be good to get a heads-up out to more wikiprojects, to encourage them to think about what might be here for them, and with luck to inspire as many editors as we can get to come and help. Jheald (talk) 19:14, 22 December 2013 (UTC)

Also a Commons VP version posted. Jheald (talk) 20:03, 22 December 2013 (UTC)
I posted a short notice at DE-Wikipedia (it´s in German, sorry). --Rudolph Buch (talk) 19:48, 8 January 2014 (UTC)

Book categories[edit]

We should add a note about creating Book Categories for the scans to be put in. For example, I saw User:Jacklee created Category:A Dissertation on the Soil & Agriculture of the British Settlement of Penang (1836) by James Low to put some of his downloads in.

We probably need a template to put on the category page, "This category includes some images from British Library book xxxxxxxxx", with a link using the Flickr tag. Jack Lee has also added the Commons cat to the books entry in the Synoptic Index, which alse seems very sensible.

I was looking for guidance we can refer people to, as to what is best practice for book cats; but wasn't sure where to look.

We should probably encourage people to look in Category:Books. Category:Books_by_topic and Category:Books_by_genre, to think what cats people should file their book-cats by, sub-cats like Category:Books about countries, Category:History books and Category:Geography books to see if their book already has a category.

We also seem to have Category:Images from books; and then there's Category:Book illustrations

{{cite book}}, as used in eg Category:History of the Cotton Manufacture in Great Britain ... (1835) by Edward Baines is a good way to give the OCLC (or ISBN) for a modern book -- but of course has no bl1million Flickr link possibility.

Alternatively, here's one approach for a Gutenberg/Internet archive book, Category:Highways_and_Byways_in_Cambridge_and_Ely_(book), which includes a {{Creator}} template.

Perhaps we could wrap {{cite book}} in something bigger -- who's good with templates? & how much auto-categorisation is it desirable for that template to do -- should it automatically place the book in the right Category:Books by date category? There may be unusual intermediate-level cats that might break such rules, eg Category:Illustrations from The Memoirs of Sherlock Holmes by Sidney Paget is a sub-cat of Category:The Memoirs of Sherlock Holmes, rather than of the data category directly.

On a different note, Category:Books by topic may give some good subjects for populating the subject part of the synoptic index.

I was just wondering what people's thoughts were, because I am a bit perplexed. Jheald (talk) 02:03, 24 December 2013 (UTC)

On a related note, I've just created Category:Books with images from the British Library Mechanical Curator collection. Andy Mabbett (talk) 16:39, 26 December 2013 (UTC)

Use with photographs[edit]

Commons:British Library/Mechanical Curator collection/script, when SUBST, installs an instance of {{Artwork}}. This doesn't sit well for photographs like File:John Rogers bust, St Johns Deritend.jpg or File:Birmingham General Hospital, circa 1894.jpg - for example, the "current location" of the bust depicted in the former, or the long since demolished building in the latter, cannot be said to be the British Library, how should we deal with such images? Andy Mabbett (talk) 16:35, 26 December 2013 (UTC)

Also for photographs, and images of old artworks, we need to separate their date from date of book publication. Andy Mabbett (talk) 22:13, 30 December 2013 (UTC)
I think it's probably best to just remove "current location" from the default template entirely - I cribbed this from an earlier upload script used for photographs, where it was appropriate (it's the location of the physical photographic work), but forgot to remove this bit. I'll take it out just now. Andrew Gray (talk) 19:17, 10 January 2014 (UTC)

Other template fields needed[edit]

We could also do with template fields to capture the publisher and location; and caption where one exists. See this example. Andy Mabbett (talk) 22:12, 30 December 2013 (UTC)

"Derived" images[edit]

Should images where cropping only removed the surrounding text and other alterations were just an adaption of contrast be categorized in Category:Images derived from images from the British Library Mechanical Curator collection or is Category:Images from the British Library Mechanical Curator collection (where they seem to go automatically) enough? (I might have done it wrong until now, as I thought any change from the original at Flicker meant a "derived" picture.) --Rudolph Buch (talk) 11:46, 23 January 2014 (UTC)

My view (but let's see whether anyone else comes in on this) is that if it is substantially the same picture (ie just a crop or a brightness fix), the kind of thing that we would have simply have overwritten the original file with, using the same filename, if the original had been uploaded -- I think something like that is something considered very standard here on Commons, and not worth categorising separately.
On the other hand, if more radical changes have been made, eg isolating a detail, or vectorising the image to make an SVG version, I think those are worth making separately identifiable; and I think the category should be kept clear for those, so they readily stand out.
As for a double image that's been cut in half, I am not sure. I'd be tempted to put the new images in the main primary category, as each new image corresponds 1:1 to a complete image in the original book. But it might be worth also making another subsidiary category for images of that type.
That's my instinct, which led to my edits to the project page. But I am open to what anybody else thinks. Jheald (talk) 19:33, 23 January 2014 (UTC)
Sounds reasonable, so I´ll move my uploads out of "derived". Thanks! --Rudolph Buch (talk) 12:19, 24 January 2014 (UTC)

Progress of indexing[edit]

Progress of the synoptic index:

20 Dec 2013: 1290 titles indexed, representing 13.2% of images in the collection (135,546 images)

30 Dec 2013: 2258 titles indexed, representing 20.7% of images in the collection (212,167 images)

... to be continued. Jheald (talk) 23:08, 30 December 2013 (UTC)

4 Jan 2014; 2880 titles indexed, representing 26.3% of images in the collection (269,179 images) Jheald (talk) 22:44, 4 January 2014 (UTC)

13 Jan 2014; 3135 titles indexed, representing 28.2% of images in the collection (288,408 images) Jheald (talk) 22:40, 13 January 2014 (UTC)

16 Jan 2014; 4113 titles indexed, representing 36.0% of images in the collection (368,695 images) Jheald (talk) 22:54, 16 January 2014 (UTC)

17 Jan 2014; 4404 titles indexed, representing 37.7% of images in the collection (385,619 images) Jheald (talk) 18:33, 17 January 2014 (UTC)

22 Jan 2014: 4564 titles indexed, representing 39.4% of images in the collection (402,928 images)

-- this includes titles for Africa, the section for which I have been finding rather hard going to organise, and which is still quite a mess. Assistance/suggestions/thoughts very welcome. Jheald (talk) 16:42, 22 January 2014 (UTC)

03 Feb 2014: 4925 titles indexed, representing 40.3% of images in the collection (412,327 images) Jheald (talk) 22:10, 3 February 2014 (UTC)

  • plus 10477 titles added to the new Fiction/Plays/Verse/Works sections of the index, representing a further 20.9% of images in the collection (214,377 images) Jheald (talk) 20:38, 4 February 2014 (UTC)
  • 8291 titles added to "to_do" pages sections, representing a further 35.3% of images in the collection (361,222 images). That leaves approx 3.5% of images in books from other classmarks or books with very few images. Jheald (talk) 23:20, 6 February 2014 (UTC)

10 Feb 2104: 5885 titles indexed, representing 42.1% of images (430,978 images)

7384 titles in "to do" pages sections, representing 33.6% of images (343,571 images). Jheald (talk) 22:20, 10 February 2014 (UTC)

13 Sep 2014: 5964 titles in main index, representing 42.3% of images (433,265 images) Jheald (talk) 22:18, 13 September 2014 (UTC)

Tag searches on Flickr[edit]

A tip for searching by tag on Flickr.

Some of the returns for the what the BL is asking for as standard tags on Flickr -- map, portrait etc -- are going to be immense. These rapidly become unbrowseable, due to Flickr's insistence on "infinite scrolling" without paging.

But paging is possible with the current design of Flickr's mobile site, which also gives a total count of how many times the tag has been used. Eg:

One issue is that having clicked through to get to the page for one of the images, the mobile site seems to give no way to show the tags on it. So it seems one has to have a separate page open in the browser, to which one can cut and paste the Flickr picture number. Unless somebody knows where the information is hidden on the mobile version? Jheald (talk) 10:28, 26 January 2014 (UTC)


Now that the project is moving along, before things get too much further, I wonder if we could review some of the templates, to see if we think they are right? Jheald (talk) 13:21, 26 January 2014 (UTC)

Template:BL1million bookcat[edit]

{{BL1million bookcat}} was my first effort at template wrangling. I meant to put a note here just after I'd made it, asking what people thought, and whether there were any comments/suggestions/things that needed fixing, but I guess I'm only getting round to that now.

One thing I am aware of is that it ought to be internationalised. But are there any thoughts about it at this stage first, to make sure it's right, before we make it a lot more complicated? -- Jheald (talk) 13:21, 26 January 2014 (UTC)

{{LangSwitch}} now built into the template (which was a lot easier than I expected). There are three messages that need to be translated, if anybody would like to start putting it into other languages. Jheald (talk) 17:02, 19 February 2014 (UTC)

Template:Commons:British Library/Mechanical Curator collection/script[edit]

Commons:British Library/Mechanical Curator collection/script

I recently changed this to output {{PD-old-70-1923}} rather than {{PD-old-70}} and {{PD-1923}} (diff), to try to tidy up the description pages a bit, and make them a bit less cluttered. Is this okay, or was there value in using the two separate licensing templates as it was before? Jheald (talk) 13:21, 26 January 2014 (UTC)

Template:British Library image[edit]

The other thing I was wondering is: should we replace the present invocation of {{British Library image}} with a new template, more specific to the BL Mechanical Curator collection.

I see the following potential advantages:

  • a new template could auto-incorporate the categorisation to Category:Images from the British Library Mechanical Curator collection
  • a new template could more clearly advertise the Collection, and that the image was from it (cf the bookcat template above), rather than just that the image was from the BL
  • a new template could link directly to this page, with the information it has about the collection, and what we're advising is best practice for re-using images from it
  • a new template could link to the Flickr page for images from the relevant book, which is reasonably nice, rather than dumping people at the BL catalogue with its million-and-one login requirements, which is (IMHO) a horrible place to send anyone.
  • a new template could carry the relevant information for the image and the book on Flickr in an organised way in its parameters, including eg the "sysnum" for the book as at present, and also the image identifier on Flickr.
    (Slight note about this: do the images in fact have stable image identifiers on Flickr? Or do they get zapped and changed when images are eg replaced by rotated versions, with no way to track them).

I'm therefore suggesting creating something like a {{BL1million image}}, to be a drop-in replacement for {{British Library image}} in the above script. Any thoughts or views?

(Also, would we allowed to add a flag to shut off the "This tag does not indicate the copyright status of the attached work. A normal copyright tag is still required. See Commons:Licensing for more information" message, to try to make the tag a bit more slimline, when we believe that such a licensing tag will also have been automatically added?)

What do people think? Jheald (talk) 13:21, 26 January 2014 (UTC)

I think a specialised template might be worthwhile - in the long run, we could always merge it as a special case of the {{British Library image}} code and use {{BL1million image}} (or {{Mechanical Curator image}}) as a pre-filled, ensuring we have some kind of back-end stability. The alternative would be to use a second template like {{Girdwood}} as a supplement for this particular collection. Agree that "does not indicate" can be suppressed if we already have a tag!
My main requirements for a replacement template would be a) visual similarity (keep the BL logo); b) some kind of catalogue link as well as flickr, if at all possible (it doesn't have to be displayed as the primary link, but I think there's value to retaining it); c) it still emits the same overall tracking category Category:Images from the British Library as well as the Mechanical Curator images one (this is what we did for the Picturing Canada images).
Regarding identifiers - I think Ben was working on something, but I don't know where he's got up to. I lost touch when I went off on holiday before Christmas! Andrew Gray (talk) 14:04, 26 January 2014 (UTC)
{{Mechanical Curator image}} now up and running. Thoughts, tweaks, translations etc all very welcome. Jheald (talk) 21:13, 19 February 2014 (UTC)


I've now made {{HasBookCat}} for links to book categories from the synoptic index pages -- see description at Commons:British Library/Mechanical Curator collection#Categories for books for details.

The start of the Germany section of the synoptic index, Commons:British_Library/Mechanical_Curator_collection/Synoptic_index,_Europe#Germany gives a good example of the template in action.

Any and all thoughts/suggestions/criticisms/comments very welcome. Jheald (talk) 09:22, 19 February 2014 (UTC)

Internet Archive Book Images collection[edit]

The Internet Archive has just released 2.4 million images to Flickr from pre-1923 scanned books, with up to five times more than that due for release in the next weeks and months.

I've started Commons:Internet Archive/Book Images collection as a corresponding Commons page, along the same lines as this one. Up for discussion is whether we should do things in the same way as we have been going about them for the Mechanical Curator collection, or whether different approaches could make sense -- eg the bulk uploads used by User:Fae for the NYPL map images, and most recently for the Welcome Collection images (currently on their way by hard drive to the Foundation in the United States). Also whether there are existing IA resources and structures already on Commons we should be combining this with.

Please do sign up on the project page there, if interested. Cheers, Jheald (talk) 20:10, 30 August 2014 (UTC)

Blocked by spam filter[edit]

I've just tried to upload images from Malvern Chase: an episode of the wars of the Roses. ... Third edition, in the set using flcikr2Commos.

My uploads were blocked with the error report: "The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site.".

Any suggestions on how to work round this? Andy Mabbett (talk) 21:17, 6 February 2017 (UTC)

Pictogram voting comment.svg Comment the use of urls is always going to be doomed to fail for uploads or edits. If a url had been used it would have been successful. That said the notification of a problematic url should not be at the final stage of the process when the wizard process is unrecoverable. I have addressed that part of the problem in a phabricator ticket (see right)  — billinghurst sDrewth 12:39, 7 February 2017 (UTC)

Still an issue; this has just stopped me from uploading an entire book's worth from [1]. Andy Mabbett (talk) 14:38, 17 April 2017 (UTC)

Any solution to this problem? Wolfmann (talk) 14:01, 4 August 2017 (UTC)
No. The phabricator task linked to by user:billinghurst was closed as "invalid". Andy Mabbett (talk) 21:05, 4 August 2017 (UTC)
Sure there is a solution, use the url not the url. If a tool allows you to use a url, without drilling down to the base url, then talk to the developer of the tool, don't blame or expect the WMF to have to do that resolution.  — billinghurst sDrewth 03:51, 5 August 2017 (UTC)
Also the phabricator ticket was closed in that they are recommending that the problem lies outside of fixing MW and that the tool in use is not in phabricator.  — billinghurst sDrewth 03:54, 5 August 2017 (UTC)
This is not about Bitly URLs that point to Flickr. It is quite possible to conceive a fix within MediaWiki; the ticket in questions is not about a single specific tool. Andy Mabbett (talk) 16:58, 5 August 2017 (UTC)