Commons:Batch uploading/Descriptionis Ptolemaicæ avgmentvm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Descriptionis Ptolemaicæ avgmentvm[edit]

Title: Descriptionis Ptolemaicæ avgmentvm siue Occidentis notitia breui commentario illustrata

Year: 1597

Description: Early atlas of the Americas. This copy was one of the rare books stolen from the National Library of Sweden and later returned. Described in http://www.nytimes.com/2012/06/27/books/swedish-royal-library-recovers-stolen-1597-atlas-in-new-york.html

Author: Cornelius van Wytfliet, Printed by Iohannis Bogardi.

There is also machine readable metadata from the Libris identifier (e.g. http://libris.kb.se/xsearch/?query=onr:3191774&format=mods). Output alternatives are described here: http://librishelp.libris.kb.se/help/xsearch_swe.jsp?open=tech

  • Source to upload from: https://data.kb.se/datasets/2014/09/wytfliet/
  • Do the media URLs follow a pattern? Sequence number in filenames. Some files end with "pl" meaning it contains a fold out map. These should be tagged with Category:Maps by Cornelius van Wytfliet (but I guess it should be added later manually)
  • Did you contact the site owner? Yes
  • Describe the works to be uploaded in detail (audio files, images by …):

Early atlas of the Americas. This copy was one of the rare books stolen from the National Library of Sweden and later returned. Described in http://www.nytimes.com/2012/06/27/books/swedish-royal-library-recovers-stolen-1597-atlas-in-new-york.html

  • Which license tag(s) should be applied?

CC0

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

{{Kungliga biblioteket image |libris-id=3191774 |url=https://data.kb.se/datasets/2014/09/wytfliet/ }}

PeterKz (talk) 08:03, 5 June 2016 (UTC)[reply]

Opinions[edit]

Terminal based interactive webscraper for data.kb.se -> XML -> GWT.

Instead of individual uploads, perhaps you could suggest a list of works that would be useful to host on Commons? Processing a list of say, 10, would make it worth creating a special workflow for. The list would need the link to the catalog page, suggested Commons category for the images to be added to, and any additional descriptive text worth adding to all pages. I'll put together a little script to parse the web page and give me terminal prompts to create the XML faster (depending on the weather, this might be later today).

Script written, meaning data is pulled from the catalog and I can answer a couple of prompts for categorization and to confirm dates, authors etc, to generate the XML. Unfortunately there is still a bug with GWT, meaning that categorization has to be done after upload. Due to this bug I suggest deferring trying lots of uploads until it's fixed. -- (talk) 12:17, 5 June 2016 (UTC)[reply]

Thank you again! The problem is that the datasets are not homogenous. These two are pretty similar (images connected to a specific work). There are only one or two more right now that are appropriate for batch uploading and they consist of separate works where metadata needs to be pulled from the metadata catalog based on an identifier in the filename. --PeterKz (talk) 13:52, 5 June 2016 (UTC)[reply]
With your example XML I had a go at another dataset where each image is an individual work. Would a script like this help? --PeterKz (talk) 15:27, 5 June 2016 (UTC)[reply]
Assigned to Progress Bot name Category
User:Fæ ✓ Done GWT Descriptionis Ptolemaicæ avgmentvm