Commons:Batch uploading
From Wikimedia Commons, the free media repository
Commons Batch Uploading is a project to centralize the uploading of a collection of files, like sets on flickr or certain sites that have released their work as PD or any Commons compatible license. The files would be assigned to a bot operator who would see how the request would be fulfilled.
See w:Wikipedia:Public domain image resources for potential future batch uploads.
|
Create your Upload request:
Add your Upload request under one of the following sections:
|
[edit] Scripters
- Dcoetzee (talk · contributions)
- Multichill (talk · contributions)
- Duesentrieb (talk · contributions)
- TheDJ (talk · contributions)
[edit] Tools
- See Commons:Tools#Upload_media. The Python Wikipedia Bot framework supports image uploads and is particularly versatile.
- User:Dcoetzee is working on a tool for placing images in articles. See Commons:Placing images#Tools.
- Upload Script by Erik Möller
- Flickrripper allows batch uploading from a set, group or a user id on flickr.
- We need tools to facilitate rapid, accurate categorization of many images at once.
- Commonist
[edit] Past batch uploads
| Date | Name (Subpage) | Description | Images | Scripter | Uploader | Script | Category |
|---|---|---|---|---|---|---|---|
| May 2005 | 10,000 paintings from Directmedia | 10,000 public domain images digitized by the Yorck project and contributed to commons | 10,000 | Eloquence | File Upload Bot (Eloquence) | PD-Art (Yorck Project) | |
| September 2006 | Picswiss project | Roland Zumbühl agreed on releaseing his images as GFDL, depicting various areas and subjects in Switzerland. | 5,000 of 13,000 | Dake | Dake | Images from Picswiss | |
| December 2008 | Bundesarchiv | From the German Federal Archive, the images depict Germany between the 19th and 20th century including valuable photographs of the Nazi era and World War II. | 100,000 | Duesentrieb | BArchBot | Information fetch | Images from the German Federal Archive |
| March 2009 | Starr images | Images of plants of Hawaii | 60,000 | Multichill | Multichill | Images from Forest & Kim Starr | |
| March 2009 | Wenceslas Hollar Digital Collection | A collection of 2700 high resolution images of engravings of Wenceslas Hollar, about 90% of his life works | 2,700 | Dcoetzee | Dcoetzee | University of Toronto Wenceslas Hollar Digital Collection | |
| March 2009 | National Portrait Gallery | Various portraits of famous people between the 16th and 19th century. | 3,000 | Dcoetzee | Dcoetzee | National Portrait Gallery, London | |
| March 2009 | Deutsche Fotothek | Images from Deutsche Fotothek mainly about east Germany between the 19th and 20th century including the Bombardment of Dresden and other events. Only 25% of the images have been uploaded till now. | 61,587 of 250,000 | Multichill | FotothekBot | Tools used | Images from the Deutsche Fotothek |
| April 2009 | Berger Collection | A collection of high resolution images of paintings and other works from the Berger Collection, depicting British art, culture and people. | 140 | Dcoetzee | Dcoetzee | Berger Collection | |
| April 2009 | Great Images in NASA | Images from Great Images in NASA | 1,400 | TheDJ | Multichill | Great Images in NASA | |
| April 2009 | Alaska-Yukon-Pacific Exposition of 1909 | High-resolution scans of documents from the Alaska-Yukon-Pacific Exposition found here. | 700 | Dcoetzee | Dcoetzee | Alaska-Yukon-Pacific Exposition | |
| July 2009 | Commanster | Pictures of plants, animals, birds and insects of Commanster, Belgium by James Lindsey | 6,000 | Sarefo | Sarefo | Pictures by James Lindsey | |
| August 2009 | WLANL | Images from Wiki Loves art Netherland imported from the flickr group pool, depicting Netherland and its different museums. | 4,000 | Multichill | BotMultichillT | Images from Wiki Loves Art Netherlands | |
| October 2009 | FEMA site | All the images found on US Federal Emergency Management Agency Disaster Photo Librarywas copied to Commons, depicting US environmental disasters and emergency actions. | 20,000 | Multichill | BotMultichillT | script | PD US FEMA |
[edit] Failed
- Flickr Imre Solt collection (April 2009) has been denied because the UAE doesn't have FOP laws which result in most image being copyvios.
- Commons:Batch uploading/Modern Egypt Digital Archive (April 2009) Egyptian copyright doesn't have a limit for copyright of photographs, only that it becomes pd 50 years after the author is dead. Not enough images for a batch.
- Commons:Batch uploading/Images from LIFE (June 2009) Most of the images didn't have a clear copyright label.
- Commons:Batch uploading/Gathering the Jewels (September 2009) Images don't appear to be free.
- Commons:Batch uploading/Staffordshire Gold Hoard (en.Wikipedia front page news) (September 2009) the images were quickly changed from Share Alike to Non-commercial on the same day.
- Commons:Batch uploading/World War II in Africa from Flickr user gbaku User wasn't author of the album, only purchased the images.--Diaa abdelmoneim (talk) 16:04, 5 October 2009 (UTC)
[edit] Batch uploads in progress
[edit] Geograph
In the Village pump Perry Rimmer brought up the suggestion of copying all these files to Commons. Geograph is a site containing about 1,5 million {{cc-by-sa-2.0}} images of the British Isles. The Isles are divided in 1 km by 1 km squares and the goal of the project is to get at least one photo of every square. 250,397 grid squares, or 75.5% of all squares currently have an image. Most of the images we use at the English Wikipedia to illustrate villages in the United Kingdom come from this site. The quality of the images is not that high, but nevertheless this is a very rich resource. Dumps of the databases are available and also torrents containing the files. I will contact the people behind this project if we can make some sort of cooperation project of it. Before I start actually uploading images I want to do several things:
- Build category trees like Category:Towns and villages in England based on enwp and the list at Geograph
- I build the village/town tree for the UK and Ireland.
- Category trees for subjects (like "bridges") still has to be build
- Populate these trees with the current images
- Clean up the current uploads
- Wait for more disk space to arrive
Imported the database dump at the toolserver. It should be straightforward to extract all information from the database. Categories on the other hand is probably going to be a nice challenge. Found several possible tools
- http://svn.geograph.org.uk/svn/branches/british-isles/schema/ - could probably import this database
- http://gazetteer.openstreetmap.org/namefinder/ - the other way around from OSM
- OSM database at the toolserver?
- http://www.geonames.org/export/ws-overview.html - overview of tools
- http://ws.geonames.org/extendedFindNearby?lat=52.53413&lng=-2.41569 - the one I'm probably going to use. It gives the found location as a tree so I can climb/decent the tree to find suitable categories
- Multichill (talk) 16:44, 19 November 2009 (UTC) (more to come)
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
| Multichill | Preparing | GeographBot | Category:Images from the Geograph British Isles project |
[edit] US Air Force
And yet another branch of the military to pick clean. The US Air Force has a set of photos at their site. Not sure how much photo's we're talking about. The same logic as the Fema, navy and army can be used. Crawl all the galleries and extract the id (simple regex). The id's can be used like http://www.af.mil/photos/media_view.asp?id=314289 to get the image and metadata (beautifulsoup). The gallery structure can be used to make a temporary category structure. The name of the files should be like "US Air Force <id> <title>.jpg". Of course duplicate checking should be enabled like in all the other bots.
The source will be available here. Multichill (talk) 18:11, 23 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
| Multichill | Preparing | BotMultichillT |
[edit] US Army
The Fema request got me started. The US Army has a nice set of images at http://search.ahp.us.army.mil/search/images/?per=10&page=1&search= . Judging from the latest id it's around 50.000 images. The bot should probably consist of two parts
- Loop over the search pages and find the location of all images like http://www.army.mil/-images/2009/10/14/53021/ . All pages seem to be in the form http://www.army.mil/-images/YYYY/MM/DD/photo_id/
- Work on all these images
Shouldn't be to hard with some regular expressions for the first part and screen scraping with beautifulsoup for the second part. Multichill (talk) 22:07, 14 October 2009 (UTC)
I wrote a bot for this (source). It basicly works the same as the other USgov bots. The main difference is that I'm unable to extract category information. The title is based on the title field, and as a fallback, the description. The first images can be found in Category:Images from the US Army needing categories as of 23 October 2009. Multichill (talk) 14:01, 23 October 2009 (UTC)
No response so I slowly fired up the upload. Multichill (talk) 11:31, 25 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
| Multichill | Uploading slowly (Commons is short on disk space). | BotMultichillT |
[edit]
The Fema request got me started. The US Navy got about 75.000(!) images available at http://www.navy.mil/view_photos_top.asp just waiting to be copied to Commons. I wrote a bot based on the FEMA upload.
- The bot loops over all the images.
- From the META fields I get the url, long description and short description
- A regex extracts the date from the long description
- A regex extracts the author from the long description
- A regex extracts the location from the long description
- The title is constructed based on the url and the short description
- Image is uploaded and ends up in one of these categories
This is just a general overview. The source is available here. Multichill (talk) 16:48, 16 October 2009 (UTC)
[edit] Opinions
- There is a template for the US Navy images {{ID-USMil}} you could use this or create one only for the US navy and add it in the source.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Looks nice. I'll probably use it for the next files. Multichill (talk) 17:30, 19 October 2009 (UTC)
- I'm not sure if the ID should be stated first. Like on File:000629-N-5686B-001 Sailor Returns Home.jpg I think US Navy should be before the numbering.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Sure, so it would be File:US Navy 000629-N-5686B-001 Sailor Returns Home.jpg in this case. Multichill (talk) 17:30, 19 October 2009 (UTC)
- Some images like File:020121-N-5563S-003 .50-Caliber Machine Gun.jpg don't have date and location. This is because the date isn't in brackets. It is however between ")" and "--" or ")" and "–". I also don't know why the location isn't grabbed...--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Looks like I have to improve the regex to catch these cases. Both date and location use the same regex for maching. Multichill (talk) 17:30, 19 October 2009 (UTC)
- You don't need the ID in the description. Create or use a source template for the upload where the ID is stated and a link to the site is given.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- I do to prevent naming collisions. Multichill (talk) 17:30, 19 October 2009 (UTC)
Ok. Bot is changed to include the suggestions. Now it's running again. Multichill (talk) 19:50, 21 October 2009 (UTC)
- One small problem - it doesn't seem to like image descriptions with quotation marks in them, and so cuts off partway through - eg/ File:US Navy 071227-N-4014G-037 An MH-60S Seahawk assigned to the .jpg; File:US Navy 071227-N-6125G-184 ailors attached to the Nimitz-class aircraft carrier USS Harry S. Truman (CVN 75) enjoy a USO concert preformed by the band .jpg. Shimgray (talk) 21:18, 23 October 2009 (UTC)
- Ah, an escaping problem. This probably only happens to a couple of images. We can always move them to a better name if the current name is not clear. Multichill (talk) 21:58, 23 October 2009 (UTC)
I worked my way through some of the aircraft carrier categories. Interesting! Still a lot of additional categorization to do though.
- For one carrier, generally several temporary "aboard USS .." categories could be combined into one.
- The ship based temporary categories seem more helpful than the ones for stable locations, e.g. "Arabian Gulf".
- For the captions, maybe {{original caption}} could have been used.
- Minor point: given the small size of the license tag, it could have been included directly into {{information}}.
- It might be worth going through the descriptions by bot to wikify names of units, ships, etc., linking them to the corresponding articles at en.wp
-- User:Docu at 06:43, 25 October 2009 (UTC), edited 06:59, 25 October 2009 (UTC), 08:10, 25 October 2009 (UTC)
-
- You should probably move it to topic categories right away. Maybe you could use a bot.
- Stable locations only seem to be useful for photos on land.
- That could have been used, but I didn't.
- That could have been done.
- Nice to see people working on this! Multichill (talk) 11:18, 25 October 2009 (UTC)
-
- I mentioned 3 and 4 mainly for future uploads. BTW I made a bot request at Commons:Bots/Requests/vertrepbot. -- User:Docu at 16:19, 26 October 2009 (UTC)
Hey Multichill, thanks for uploading all those Navy pics, I'm sorting through them now, looking for possible FP candidates.
A few things I found will I was looking through them:
- 1. Some images seem to have had something go wrong with their title during the upload; For example this one and this one. I'm assuming you meant to have 's around some words ('Sea Sparrow') but something's gone awry. You might want to fix it before you move on to the Army upload.
- 2. There also appears to be at least one categorisation error. If you take a look at Category:General views of USS Kearsarge (CV-33) and Category:Aboard USS Kearsarge (CV-33), you'll see a number of pictures of another ship of the same name; Category:USS Kearsarge (LHD-3). They are 2 different ships (the original Kearsage was an Essex class carrier, scrapped in 1974), and while I'm happy to re-categorise them (I'll be helping with the whole batch) is there something you can do to prevent this sort of thing happening in future?
- 3. Though you stated at the Village Pump that you'd built in a duplicate checker, I've found quite a few duplicates as I've browsed the batch. For example, your file duplicates File:USS Port Royal (CG 73) aground.jpg and File:USS Port Royal grounded.jpg. Also, this one and File:AAV Embarking.jpg. There are a few others I've spotted as well, though I didn't note them down.
Hope this helps.
Sarcastic ShockwaveLover (talk) 22:09, 26 October 2009 (UTC)
- Hi Sarcastic ShockwaveLover,
- 1.: "^ldquo" in the title seems to come from "“ in the description.
- 2.: Thanks for noticing. It should be fixed now. It's was correct when Multichill uploaded it. ;)
- 3.: If you look at the file size of this file, you will notice that File:USS Port Royal (CG 73) aground.jpg, isn't a duplicate, but a scaled-down version. File:USS Port Royal (CG 73) aground.jpg should be tagged with {{duplicate}} for deletion. The new file is an improvement over the old one. I found a few ones too and tagged the old ones for deletion.
- -- User:Docu at 14:41, 27 October 2009 (UTC)
-
- I'd rather you didn't delete this one, I rotated it and cropped it to correct the tilt, I'm planning on nominating it for FP status. Sarcastic ShockwaveLover (talk) 08:57, 28 October 2009 (UTC)
- 3. yes, I listed it under "other versions" instead. Looking closer at it, it doesn't appear to be an exact duplicate or scaled down version. The few images that silp through the bot's check are some where the file was edited (and not even scaled down), e.g. this one and File:AAV Embarking.jpg. -- User:Docu at 12:33, 28 October 2009 (UTC), edited 18:22, 28 October 2009 (UTC)
- I'd rather you didn't delete this one, I rotated it and cropped it to correct the tilt, I'm planning on nominating it for FP status. Sarcastic ShockwaveLover (talk) 08:57, 28 October 2009 (UTC)
- The maximum length of file names that are being used seems to be 231 chars. While sometimes in the distant future all filenames have to be that long to be unique, I wonder if we couldn't [have] kept them shorter in the meantime. -- User:Docu at 17:34, 28 October 2009 (UTC) (inserted "have" on 04:42, 30 October 2009 (UTC))
-
- That would mean a mammoth renaming effort. It's already going to be huge just categorising them. That said, I think files like this one should be renamed. Also, perhaps we could put the categorisation/cleanup effort on the front page, much like that large German upload a few months back? It might help get some more people involved. I've categorised about
100150 images so far using HotCat (thank God for that tool), but that's just a drop in the proverbial bucket. Sarcastic ShockwaveLover (talk) 11:58, 29 October 2009 (UTC)
- That would mean a mammoth renaming effort. It's already going to be huge just categorising them. That said, I think files like this one should be renamed. Also, perhaps we could put the categorisation/cleanup effort on the front page, much like that large German upload a few months back? It might help get some more people involved. I've categorised about
-
-
- No, I don't think we should move them. The advantages of the current file names are that they are generally descriptive titles and it's the title the Navy published it with.
- The ^ldquo,/^rdquo,/^rsquo, could be fairly easy to fix (by an adminbot), there are approx. 3500 (Special:Search/^ldquo, OR ^rdquo, OR ^rsquo, prefix:File:US Navy 0). As we generally don't do cosmetics on file names, we could leave them that way though.
- The categorization part should be easier once my bot has created additional categories (see here). I probably should get to work on that.
- Besides these hardware based categories, there is still much to be done to create categories for specific events/operations etc. (e.g. Category:Vertical replenishment). It's fairly easy to build temporary categories from search results. One just needs to go through the category afterwards and remove a few false positives, most categories of FEMA officials were done that way. What generally threw it off were images of "A. on the phone with B." or "A. B. and C. (not pictured) attending Z.)", but they were easy to sort. If you want me to prepare you some temporary categories to review, I'd be glad to do so. -- User:Docu at 04:42, 30 October 2009 (UTC)
-
-
-
- Please and thank you! Sarcastic ShockwaveLover (talk) 12:10, 30 October 2009 (UTC)
-
- I found two incomplete uploads (the only so far):
- -- User:Docu at 18:43, 31 October 2009 (UTC)
- I might be a bug in preview/thumb, looks ok in full resolution. -- User:Docu at 18:46, 31 October 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
| Multichill | finished uploading | BotMultichillT |
[edit] AntWeb Images
AntWeb.org has information about every known species of ant, and high quality images of many of them. We plan to upload to wikimedia commons all the images for which we have valid taxonomic names.
We have an extensive database of all the images and their metadata. We're planning on using the Eloquence bot to do the upload. The only change we've made is to add a line in the bot's input file to allow it to save a file under a name different from the name used on the AntWeb site. The biggest challenge has been getting the wikipedia commons templates correct.
So far only about 200 images have been uploaded, as the AntWeb bot has not been yet been approved. Questions about the bot, or about AntWeb may be addressed to Dave Thau. Davethau (talk) 02:51, 13 October 2009 (UTC)
[edit] Opinions
- Having the order, family... in the description is redundant since it's already in the {{taxonavigation}} template that you're adding.--Diaa abdelmoneim (talk) 21:02, 13 October 2009 (UTC)
- Remove the text next to permission saying "CC-BY-SA-3.0", since it's also redundant with the licensing template. This would result in "See below" directs the user to licensing...--Diaa abdelmoneim (talk) 21:02, 13 October 2009 (UTC)
- Please add "== {{int:license}} ==" before {{AntWeb permission}} for a link to what licensing is in different languages...--Diaa abdelmoneim (talk) 21:02, 13 October 2009 (UTC)
- Please also add "== {{int:filedesc}} ==" before everything to indicate a summary.--Diaa abdelmoneim (talk) 21:02, 13 October 2009 (UTC)
- All the issues have been resolved on Commons:Bots/Requests/File Upload Bot (AntWeb)...--Diaa abdelmoneim (talk) 08:10, 17 October 2009 (UTC)
| Assigned to | Progress | Bot name | Maintenance category |
|---|---|---|---|
| Dave Thau | Uploading | AntWeb bot | Category:Images from AntWeb |
[edit] Metropolitan Museum of Art
This is one I've been working on for a while. The Metropolitan Museum of Art has a large collection of about 60,000 images of works in their online collection database, at a variety of resolutions. These have to be filtered carefully by hand because they have many photographs of 3D works and many non-PD works.
For most images that have a high-res version, it is easy to extract it by simply taking the URL of the thumbnail or regular image and changing "thumb" or "regular" to "zoom". This trick works for all images except those in the "The Libraries" collection (which only contains 50 images). Many images contain a color guide and false copyright statement that will need to be cropped at some point.
- Did u try to contact them? Maybe they'd like to help.--Diaa abdelmoneim (talk) 08:51, 3 July 2009 (UTC)
- User:unforth of Flickr also has an extensive collection (several hundreds) of MET pictures like this one. Teofilo (talk) 08:17, 12 September 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
| User:Dcoetzee | License sorting | User:Dcoetzee |
[edit] Wikipedia Loves Art
Upload all the photos from the Wikipedia Loves Art photo pool on Flickr that have the machine tag "caption=yes".
This process actually involves several phases (each with its own script):
- The first script uses the Flickr API to examine all the photos in the pool, and copy all the relevant tags, comments, file info, and metadata into a local mySQL database. I utilized the phpFlickr class for most of this.
- The second script downloads all the appropriate Flickr photos from the pool at the highest resolution available.
- The third script attempts to create a unique human-readable filename for each photo by examining the tags and comments (too complicated to describe here).
- A fourth script lets humans review the filename suggestions and change them as appropriate. See Wikipedia:Wikipedia Loves Art/Filename review contest
- Once the filenames are approved, a fifth script renames all the files locally.
- The files are then batch uploaded to Commons with templates filled in according to the data stored in the mySQL database. This script is partially based on Eloquence's file upload script.
| Assigned to | Progress | Bot name |
|---|---|---|
| Kaldari (talk) | Phase 6 (partially complete) | File Upload Bot (Kaldari) |
[edit] Comments
Just a thought for future events: as the participants made two images of each object (one with an index card, one of the image for articles), it might be worth uploading both under the same filename. (First the one with the index card obviously). This could help identification later. As once in a while there are questions about the colors of paintings, it might be worth taking the first picture with an index card and a color chart. -- User:Docu at 07:22, 23 September 2009 (UTC)
[edit] Tropenmuseum
The Tropenmuseum donated about 2100 image related to Suriname and will donate a lot more images in the future (see Commons:Tropenmuseum). GerardM did the communication part, did Multichill the uploading/technical part.
[edit] Suriname
The first batch I got were 2100 images related to Suriname and the Marroon. I received a DVD containing the images and a Microsoft Access database containing the metadata. I created a user ODBC connection in windows and used pyodbc to make a connection from python. The code is a combination of custom code, pywikipedia and functions I copied from previous projects (Deutsche Fotothek & WLANL). The filenames were already in the right form and contained a unique identifier so I had my bot loop over the files and for each file:
- Extract the unique id
- Using the identifier pull all relevant info from the database
- Generate a description
- Generate temp categories
- Generate a Sha1 hash and check for duplicates
- If the file doesn't exist yet, upload the file using KITbot
Of course you can find the source in my svn.
The provided metadata was excellent. It contains descriptions in one (Dutch) or more (English) languages and was very useful for generating temp categories. All the images are placed in Category:Images from the Tropenmuseum and a bunch of temp categories. Images have to be copied from these temp categories to real categories. Turned out we don't have a lot of Suriname related images so I pretty much had to build a category tree from the ground up. This is a lot of work, but images end up in very good topic categories. It also improves the chance of images ending up in multiple relevant topic categories (previous batch uploads images got stuck at only one category). This is a lot of moving around, but I that's just a job for a bot. This mapping causes a lot of over-categorized images, but this can easily be fixed with the recategorization bot (imagerecat.py -cat:Images_from_the_Tropenmuseum -onlyfilter). For the next part we have to figure out how to get people to categorize the images because I don't feel like doing this all alone. Users only have to map temp cats onto topic categories, the actual moving is done by a bot. Not sure how to make this easy for other users. Multichill (talk) 11:41, 16 September 2009 (UTC)
[edit] Indonesia
Yesterday Gerard and I visited the Tropenmuseum. We got 35.000 images and a database with all the metadata. I slightly modified the program I used for Suriname and fired up the bot. Modifications:
- Other database name and other table names
- Changed the regular expression to find the id of the file
- Removed some encoding bugs
- Filtering the temporary categories to get rid of the completely useless categories right away
- Added <!--{{id|1=To be translated}}--> so Indonesian translations can be added later.
The upload will probably be finished tomorrow. Than comes the hard part: Categorization. I added temporary categories again, but this time I got some data from the Tropenmuseum describing the structure of these categories so I can build a tree. I will first do this for the geography tree. Multichill (talk) 22:19, 26 November 2009 (UTC)
[edit] Opinions
- Making categorization easier: How about doing something like with the Fotothek upload? Like creating the temporary categories with a commons delinker link and a suggested category, waiting for a user to review it. And where are all these categories stored? I mean where can I find a list of all the temporary categories with how many files they contain so I could check for a better category name also for the delinker? Automatic Dutch to English translation would also make it a lot easier, instead of going to Google and translating...BTW, the upload is already finished right?--Diaa abdelmoneim (talk) 00:05, 18 September 2009 (UTC)
- You can find lists at User:Multichill/KIT/categories and User:Multichill/KIT/categories2. See the history for the progress. I already worked on most categories. Multichill (talk) 08:48, 18 September 2009 (UTC)
[edit] WLANL
[edit] Description
Batch upload of all suitable images in http://www.flickr.com/groups/wikilovesart/. These images were created for http://www.wikilovesart.nl/
I'm using BotMultichillT (talk · contributions · deleted user contributions · logs · block log) for the uploads.
[edit] How the bot works
The source. The bot works like this:
- The bot loops over all the images in the Flickr group
- The bot checks if a suitable license is on the image
- The bot checks if an allowed tag is on the image and not a disallowed tag
- If it's a suitable image the bot will pull the description from Flinfo
- The description is improved based on the added tags by using a template trick and User:Multichill/WLANL/descriptions
- Categories are added using the same trick and User:Multichill/WLANL/museums
- The image is marked as reviewed by Multichill (talk)
- The categories are filtered using the functions in imagerecat.py
- The filename is derived from the username and the title assigned by the user
For more details see the source code.
[edit] Update
I received a lot of permissions, I used a modified (hacked) version of flickrripper to upload these images. Also received permission from the remaining museums. I'm uploading them now. I will do a flickrripper run over the whole pool at the end to catch images not tagged correctly. Multichill (talk) 12:48, 29 October 2009 (UTC)
[edit] Opinions
- Since the description is in Dutch I suggest using {{nl|}} for the descriptions.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Also {{WLANL}} should have a link to Wiki Loves Art project page or Flickr group.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- I suggest moving {{WLANL}} to the source parameter in the information template and adding |url= as a parameter in {{WLANL}} with the Flickr source link.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Flinfo doesn't add a description using the image's name so for http://www.flickr.com/photos/tainab/3675827235/ it doesn't add "Amandelbloesem, Vincent van Gogh (1890)" which would be a good description.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Generally, I think uploading to Flickr isn't the best choice since it reduces image quality for non pro members like here http://www.flickr.com/photos/petertf/3638745721/ . Maybe offering Flickr Pro accounts to participants would dissolve this issue :-)--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Thanks for your input.
- Good point. I will change this
- {{WLANL}} sure needs improvement. This is just a quick hacked up version I created because otherwise I would have a red link. It's on my list
- I like it at the bottom because it doesn't clutter op {{Information}}
- This should be changed in Flinfo. I'll do a request at User talk:Flominator#Flinfo request
- Yes. Not having the originals sucks. I'll leave a note asking users to upload the original version if possible. It would be nice if these kind of projects would upload to Commons directly, but with the current tools that's kind of hard.
- Multichill (talk) 07:10, 20 August 2009 (UTC)
- All images use {{nl}} now. Flominator changed Flinfo to also pull the description from the title.
- I fired up the bot to upload. Issues i'm currently aware of:
- Some images get tagged as uncategorized, but these images are categorized.
- Some names are not that good
- I have some name collisions.
- Issues are not that serious and can be fixed later on. Multichill (talk) 14:01, 22 August 2009 (UTC)
There seems to be a problem with getting descriptions from titles. It adds the title in the description, but does so twice. This happened to multiple images:
- File:WLANL - wendier - de Jonge.jpg
- File:WLANL - wendier - Zielenprauw .jpg
- File:WLANL - wendier - Verenvitrine Suriname.jpg
- File:WLANL - wendier - Danseres Nias.jpg
- First part is from User:Multichill/WLANL/descriptions, second part is from the title. Multichill (talk) 16:53, 22 August 2009 (UTC)
[edit] Images from NYPL Digital Gallery
| Assigned to | Progress | Bot name |
|---|---|---|
| Dcoetzee | Stale | Dcoetzee |
Will be great if we batch upload PD-images from NYPL Digital Gallery - http://digitalgallery.nypl.org/nypldigital/index.cfm NYPL Digital Gallery provides free and open access to over 685,000 images digitized from the The New York Public Library's vast collections, including illuminated manuscripts, historical maps, vintage posters, rare prints, photographs and more. --Butko (talk) 14:45, 14 April 2009 (UTC)
- This collection turned out to be more promising than I supposed. They use LizardTech ContentServer to serve up their images, whose API is described here. Here's how you extract original TIFFs at full size: first use a "browse" query to obtain some XML including the image dimensions, like this one [1]. The folder name and image name can be obtained from URL of the zoom view. Then, use a getimage query like this one [2] to get the full size TIFF, specifying the dimensions from the previous query. Tada. Close examination shows no artifacts in the TIFF - these are original scans (internally, they are SID images). The first one I extracted was 3845 × 4947, about 60 MB as a TIFF, and 27 MB as a PNG (which you can preview here). They throttle you at 80 KB/s per transfer, but they do allow simultaneous transfers; any way you look at it though it would take a long time to fetch all the images we need. In light of the long download time per image, we're going to want to license filter before downloading. Dcoetzee (talk) 07:00, 15 April 2009 (UTC)
- Update: their complete collection of high-resolution images is browsable here. This can be used to easily obtain a list of folder-name pairs. I'll presently begin downloading. Dcoetzee (talk) 06:23, 16 April 2009 (UTC)
- Update: a better way to download these is to use the "getfile" function to get the raw .sid files, which are highly compressed (as in [3]) and then use LizardTech's command-line decoder to convert to TIFF ([4]). This is a quicker download and doesn't even require the dimensions. Dcoetzee (talk) 22:12, 16 April 2009 (UTC)
- I'm still in the middle of grabbing these. Enumerating IDs turned out to be trickier than I thought, because the folders are so large the browse interface times out on them. I ended up enumerating them instead using wildcard searches on single letters. Even just looking at the high res images, it's a lot of data. All told we're talking at least 100 GB in PNGs, and I'm pretty sure all of the high-resolution images are public domain works, although that will require further confirmation. It's an excellent source. Dcoetzee (talk) 06:59, 22 April 2009 (UTC)
- Update: I've enumerated about 65000 high-res images, and am in the process of downloading and converting them to PNGs, slow enough to not overwhelm their bandwidth. So far I've retrieved about 17250, occupying 323 GB. I'm also in the process of generating image descriptions of them based on NYPL metadata. I've created Category:New York Public Library Digital Gallery and plan to start uploading some of them soon. Dcoetzee (talk) 13:18, 16 May 2009 (UTC)
- Update: I've had contact from a representative of the NYPL, who has been very helpful in furnishing IDs and sanctioning the sharing of their public domain images. He gave me a list of about 40,000 stereographs which I can begin uploading immediately as soon as I put together a suitable fully-automated upload tool for the task. Dcoetzee (talk) 21:43, 25 June 2009 (UTC)
- Great work. I think this is good news and I'm very happy that someone over there is nice enough to help out.--Diaa abdelmoneim (talk) 20:46, 27 June 2009 (UTC)
-
-
- I have just begun automated uploading of this collection of 40,000 images, which are being placed along with existing images in Category:Images from the New York Public Library. Each image and its metadata is being downloaded from NYPL on-the-fly. Dcoetzee (talk) 03:11, 28 June 2009 (UTC)
- Update: I've estimated that at my present rate of upload, the current collection being uploaded (which actually contains 84000 images) will require about 7 weeks to upload, and will occupy about 500 GB. Dcoetzee (talk) 10:38, 28 June 2009 (UTC)
-
Nice upload, but I have a couple of points you should address:
- I don't like the two versions (png & jpg). Who cares about thumbnail size? Are you sure you want two upload two versions of every image? And why not upload the original tiffs for our restoration people?
- The files are uncategorized, please tag them with {{subst:unc}} right away.
- How are you going to get these files categorized? The images should probably all in a subcategory of Category:Stereo cards and in one or more topic categories
Other versions field seems to be brokenthat was an easy fix. Multichill (talk) 11:30, 28 June 2009 (UTC)
Multichill (talk) 11:20, 28 June 2009 (UTC)
More to question:
- Do u mean by 84000 images, 42000 png and 42000 jpg?
- Why don't u merge the source template into the source field in the {{NYPL-image-full}} template?
- Does the bot auto categorize?
- What's the license of these images? why are they pd? I mean why is the original file before the scan pd?--Diaa abdelmoneim (talk) 12:08, 28 June 2009 (UTC)
- They're all PD due to age ({{PD-1923}}), according to the NYPL, although some of them don't list a specific date on their page (for many of them, you have to click through to the original source description to verify the age). There was one date field that I was not grabbing, which I am currently modifying it to grab. The bot does not do autocategories (I don't have that functionality, and I don't trust autocategories anyway), but I am now automatically marking them as uncategorized. Uploading the TIFFs doesn't make any sense, because they are derived from MrSID files and contain exactly the same data as the PNG files (there is no metadata).
- I also prefer not to have two versions, but thumbnail size is a very real concern, and unfortunately the software does not support JPEG thumbnails for PNG files. For example, a typical image of width 300 would be about 30 KB in size, which is prohibitive for modem users when many such images are used on a page. When the software adds a proper feature for this, they can all be deleted. Oh, and no, I mean 84000 PNG and 84000 JPEG.
- Should I be putting these all in the root category Category:Stereo cards? Dcoetzee (talk) 17:25, 28 June 2009 (UTC)
Categorizing
- I'm currently categorizing to the "Category:Robert N. Dennis collection of stereoscopic views"--Diaa abdelmoneim (talk) 17:22, 28 June 2009 (UTC)
- I can take care of categorizing by source collection automatically if you wish - please don't go to unnecessary manual effort. :-) Dcoetzee (talk) 17:26, 28 June 2009 (UTC)
- I started a bot that that does this for the first 1600 images. It would be good if u do this with all your upcoming uploads. And you said 84000 images as a first batch. How many more batches are there? If it is possible for me to assist in the upload I would be glad to do so. Multichil also has a university connection or a very high speed connection I'm sure if we ask him kindly he would help in the upload. If we work together we can upload this in a week. And please don't add the images in the stereo card root category. Just in the Category:Robert N. Dennis collection of stereoscopic views.--Diaa abdelmoneim (talk) 17:49, 28 June 2009 (UTC)
- Unfortunately that may not be an option, depending on how fast the NYPL wants their servers hit. I can inquire about it. I can deal at least with the Robert N. Dennis collection right now, but other subcollections will have to wait until I see how many collections there are and how meaningful they are. Dcoetzee (talk) 17:53, 28 June 2009 (UTC)
- So should I keep categorizing the first 1600 images of the batch? I don't want there to be a double category or something. How many images do u upload daily? And how big of a PD collection do they have?--Diaa abdelmoneim (talk) 18:00, 28 June 2009 (UTC)
- No, I'll go back for them a bit later this week, don't worry. :-) And I'll check for any existing category so double categories will not occur. I upload roughly one image every 50 seconds or 1728 per day (this includes both the PNG and JPEG). I have no idea how large their complete PD collection is, and I don't think they do yet either. Dcoetzee (talk) 18:08, 28 June 2009 (UTC)
- So should I keep categorizing the first 1600 images of the batch? I don't want there to be a double category or something. How many images do u upload daily? And how big of a PD collection do they have?--Diaa abdelmoneim (talk) 18:00, 28 June 2009 (UTC)
- Unfortunately that may not be an option, depending on how fast the NYPL wants their servers hit. I can inquire about it. I can deal at least with the Robert N. Dennis collection right now, but other subcollections will have to wait until I see how many collections there are and how meaningful they are. Dcoetzee (talk) 17:53, 28 June 2009 (UTC)
- I started a bot that that does this for the first 1600 images. It would be good if u do this with all your upcoming uploads. And you said 84000 images as a first batch. How many more batches are there? If it is possible for me to assist in the upload I would be glad to do so. Multichil also has a university connection or a very high speed connection I'm sure if we ask him kindly he would help in the upload. If we work together we can upload this in a week. And please don't add the images in the stereo card root category. Just in the Category:Robert N. Dennis collection of stereoscopic views.--Diaa abdelmoneim (talk) 17:49, 28 June 2009 (UTC)
- I can take care of categorizing by source collection automatically if you wish - please don't go to unnecessary manual effort. :-) Dcoetzee (talk) 17:26, 28 June 2009 (UTC)
- Could the bot also categorize to location? Like in File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg the location being Michigan? --Diaa abdelmoneim (talk) 18:31, 28 June 2009 (UTC)
- The past couple of files have been very low res. Is this a mistake by the bot or are these really low res?--Diaa abdelmoneim (talk) 18:34, 28 June 2009 (UTC)
- Some files do not have SID files available from the NYPL - for these I upload the highest available resolution, which is about 700px wide. And yes, I may be able to extract the rough location from the Original Source field. For now I must go away but back later. :-) Dcoetzee (talk) 18:44, 28 June 2009 (UTC)
- There are till now about 230 files concerning the Union Pacific Railroad. Could u automatically add the category to it?--Diaa abdelmoneim (talk) 20:52, 28 June 2009 (UTC)
Looks like all images are now tagged with Category:Robert N. Dennis collection of stereoscopic views and {{Uncategorized}}. This seems like a good starting point to me, but i rather have a dedicated uncategorized template just like with Barch and Fotothek. Could you please tag the images with {{Uncategorized-NYPL}}. I'll create the remaining structure later this week. This will prevent your uploads from flooding the regular tree and messages like this one. Multichill (talk) 20:07, 29 June 2009 (UTC)
- Ok. The basics are there. If everyone agrees we only need to run a bot to change the old uploads (
replace.py -lang:commons -family:commons -transcludes:NYPL-image-full -regex -nocase "\{\{Uncategorized\|" "{{Uncategorized-NYPL|"). Multichill (talk) 20:21, 29 June 2009 (UTC)
subject Categories
Could u or Multichil create a bot that automatically adds a temporary subject category to each file that would be checked and if correct be moved into a permanent category like what has been done with Fotothek or BArchive? I'm not sure we should wait till the first 80,000 images are up and then start cating. BTW the NYPL has started receiving funds again from the city of New York so they might stop throttling downloads. It would be beneficial if u would inquire about that.--Diaa abdelmoneim (talk) 20:22, 30 June 2009 (UTC)
- I'd be happy to do this but haven't seen this type of thing before - is there an example or description of this process somewhere? Many of these can (if nothing else) be automatically categorized into the category for the city where they were taken. Dcoetzee (talk) 22:15, 30 June 2009 (UTC)
- Commons:Fotothek has categories assigned to their files based on the description. In "Original source: " it is mostly written at the end what the subject or where the photo was taken. Dividing the image in such categories would make further categorization easier. So for example File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg has "Original source: Robert N. Dennis collection of stereoscopic views. / United States. / States / Michigan / Stereoscopic views of Lake Superior Scenery." You could grab from there "Stereoscopic views of Lake Superior Scenery" cause it's after a slash and before a bracket. The category would later be reviewed and approved by a user. The temp category would be "NYPL_Stereoscopic views of Lake Superior Scenery" This would serve as preliminary categories.--Diaa abdelmoneim (talk) 22:23, 30 June 2009 (UTC)
- That makes sense - incidentally, is there an easy way to merge a category into a different existing category? Will CommonsDelinker do this? For many of these the corresponding existing category is obvious, and automated merging would be desirable. Dcoetzee (talk) 22:40, 30 June 2009 (UTC)
- I'm currently automatically subcategorizing the images and placing the categories in Category:Temporary categories for images from the New York Public Library. I'm also updating the uncategorized tags and Robert N. Dennis category on my initial uploads. Dcoetzee (talk) 01:55, 1 July 2009 (UTC)
- See User:CommonsDelinker/commands/documentation#Categorize uncategorized images. Multichill (talk) 19:37, 1 July 2009 (UTC)
- Is it possible to have a template like the one found on http://commons.wikimedia.org/wiki/Category:Images_from_the_Deutsche_Fotothek,_location_Dresden ? so that it makes categorizing easier?--Diaa abdelmoneim (talk) 09:37, 2 July 2009 (UTC)
- That sounds like a good idea. However, I'd want to be sure first that CommonsDelinker recognizes the new Uncategorized-NYPL... Dcoetzee (talk) 10:55, 2 July 2009 (UTC)
- Dcoetzee, you should probably only add Uncategorized-NYPL if you can't don't have a proper temp category. This way we can just use the normal category move bots to move images from a temp cat to a proper topic category. Multichill (talk) 11:01, 2 July 2009 (UTC)
- Dcoetzee, can we delete a temp category once it's cleaned out or do you expect more images to go into these categories? Multichill (talk) 17:03, 2 July 2009 (UTC)
- That sounds like a good idea. However, I'd want to be sure first that CommonsDelinker recognizes the new Uncategorized-NYPL... Dcoetzee (talk) 10:55, 2 July 2009 (UTC)
- Is it possible to have a template like the one found on http://commons.wikimedia.org/wiki/Category:Images_from_the_Deutsche_Fotothek,_location_Dresden ? so that it makes categorizing easier?--Diaa abdelmoneim (talk) 09:37, 2 July 2009 (UTC)
- See User:CommonsDelinker/commands/documentation#Categorize uncategorized images. Multichill (talk) 19:37, 1 July 2009 (UTC)
- Commons:Fotothek has categories assigned to their files based on the description. In "Original source: " it is mostly written at the end what the subject or where the photo was taken. Dividing the image in such categories would make further categorization easier. So for example File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg has "Original source: Robert N. Dennis collection of stereoscopic views. / United States. / States / Michigan / Stereoscopic views of Lake Superior Scenery." You could grab from there "Stereoscopic views of Lake Superior Scenery" cause it's after a slash and before a bracket. The category would later be reviewed and approved by a user. The temp category would be "NYPL_Stereoscopic views of Lake Superior Scenery" This would serve as preliminary categories.--Diaa abdelmoneim (talk) 22:23, 30 June 2009 (UTC)
NYPL and PD-Scan
Dcoetzee I'm a little unhappy with the way our images are tagged as PD-Scan only. Many of the images don't have their original publish date and someone who looks on the picture can't be sure if it's PD as there is no clear sign of it. For example File:Arch_on_St._George_Avenue,_from_Robert_N._Dennis_collection_of_stereoscopic_views.png has only "Digital item published 5-5-2005; updated 2-12-2009." which doesn't assert PD-old. There is an NYPL page about the collection which may hold clues about why the collection is PD. I think after we clear why the collection is PD we should create a template stating why it is PD, which goes along the PD scan. --Diaa abdelmoneim (talk) 10:58, 4 July 2009 (UTC)
- I agree, the NYPL image metadata does not generally contain sufficient metadata to clearly establish their copyright status. I have only the word of the NYPL that these are public domain, and they may not as be as conservative in evaluating copyright status as we are. I don't really want to filter them before upload though, because I'm fairly confident most of these actually are PD and are just missing the metadata to prove it. There are two things I can do here: I can fetch the "Imprint" date from the collection, and I can tag any images that do not have a clear indicator of copyright status for human review with Category:PD files for review. This could prove to be rather difficult though, because dates are specified in a variety of strange formats that are difficult to parse. Dcoetzee (talk) 22:15, 4 July 2009 (UTC)
- Or just an OTRS confirmation, or a rights information page on their site saying "no known restrictions". Don't tag anything please. I'm sure all images are PD but only need a legal confirmation.--Diaa abdelmoneim (talk) 22:19, 4 July 2009 (UTC)
- As far as I know OTRS is inappropriate for public domain images - that's for the copyright holder confirming that they've released a work, and NYPL is not the copyright holder. Their copyright status will need to be confirmed based on the available information, and PD review has already agreed to help me with kind of thing in the past. As for "no known restrictions", every one of these image description pages says that in its HTML metadata - their evaluation can't be trusted. Dcoetzee (talk) 23:30, 4 July 2009 (UTC)
- Or just an OTRS confirmation, or a rights information page on their site saying "no known restrictions". Don't tag anything please. I'm sure all images are PD but only need a legal confirmation.--Diaa abdelmoneim (talk) 22:19, 4 July 2009 (UTC)
- Status
What's the status of this upload? Multichill (talk) 12:29, 17 September 2009 (UTC)
- Sorry for the delay. I'm working on getting a Toolserver account so I can continue the upload with my existing tools and Mono, or with a rewrite of the tools. It should be able to pick up right where I left off. I don't have enough bandwidth at home to do the upload. Dcoetzee (talk) 08:48, 25 September 2009 (UTC)
[edit] New requests
[edit] University of Washington Digital Collections
The same algorithm applied to Commons:Batch uploading/Freshwater and Marine Image Bank can be used on multiple collections of the UW collections. I'll list some here with the reason of why the images would be PD.
- Albert Henry Barnes collection 302 files. It's {{PD-old-70}} since the author died in 1920 according to this
- Alaska Youkon Pacific 1311 might contain some of Category:Alaska-Yukon-Pacific Exposition {{PD-US}}
- William F. Boyd All images are before 1923 according to this so {{PD-US}}
- Boyd and Braas photographs all before the 20th century
- Childrens books most are PD since they were released before 20th century.
- John N. Cobb died in 1930 so {{PD-old-70}}.
There are many more that could be checked.--Diaa abdelmoneim (talk) 17:54, 17 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
[edit] Defenselink
And yet another request because of the Fema request. Defenselink is the site of the US Department of Defense. It contains about 11.000 images in the form http://www.defenselink.mil/photos/newsphoto.aspx?newsphotoid=11822 . So just loop over the id from 1 to about 12000 and parse each image. Multichill (talk) 22:18, 14 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] NOAA Photo Library
The Fema request got me started. NOAA has a nice set of images at http://www.photolib.noaa.gov/ . Not sure what amount of images we're talking about, but at least a couple of thousands. Multichill (talk) 20:09, 14 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Images from Beinecke's collections
One more wonderfull collection with lot of PD-images - http://beinecke.library.yale.edu/digitallibrary/ 200,000 digitized images of photographs, illuminated manuscripts, maps, works of art, and books from the Beinecke's collections --Butko (talk) 08:50, 16 April 2009 (UTC)
- Did you contact them? Did you get a release? Or is this merely a suggestion. That shouldn't go here imho. Nice collection though, we should contact them to get some nice images. Multichill (talk) 14:05, 7 June 2009 (UTC)
- I would like to help out on the acquisition of images of this library. I wanted to send an e-mail but thought it would be best if we work together on a draft. --Diaa abdelmoneim (talk) 14:59, 7 June 2009 (UTC)
- Ok. As discussed on irc: You'll contact the library. Please keep me posted. Multichill (talk) 15:07, 7 June 2009 (UTC)
- Any update on this one? Multichill (talk) 23:14, 4 September 2009 (UTC)
- I sent them a mail multiple times but they didn't reply....--Diaa abdelmoneim (talk) 23:18, 4 September 2009 (UTC)
- Any update on this one? Multichill (talk) 23:14, 4 September 2009 (UTC)
- Ok. As discussed on irc: You'll contact the library. Please keep me posted. Multichill (talk) 15:07, 7 June 2009 (UTC)
- I would like to help out on the acquisition of images of this library. I wanted to send an e-mail but thought it would be best if we work together on a draft. --Diaa abdelmoneim (talk) 14:59, 7 June 2009 (UTC)
- User:JovanCormac seems to have started uploading the Detroit Company images. Maybe the batch should be split into many parts then each uploaded on its own.--Diaa abdelmoneim (talk) 17:06, 16 October 2009 (UTC)
[edit] Images from World Digital Library
New site with PD-images - http://www.wdl.org. Contain 1170 items --Butko (talk) 06:52, 22 April 2009 (UTC)
- User:Sj shown interest in working on this upload. Looks like a very nice collection. Some points:
- The items have an id (http://www.wdl.org/en/item/100/), so easy to loop over
- The description of the items is available in a lot of languages, we should use that
- Lot's of metadata is available, this should make categorization easier
- One item can contain multiple files. We should be aware of that
- Files are available in the tiff file format. We should either have tiff thumbnails or upload tiff and a jpg version (transcoding!)
- Experience and code gained with the usgov uploads should be (re)used
- Some items have curator video's, might be fun to upload too
- Multichill (talk) 14:13, 8 November 2009 (UTC)
[edit] Maps from Ryhiner Collection
Available from www.stub.unibe.ch/stub/ryhiner/ I´ve dealing with this collection for time (see this file for a example). This collection consists in "over 16000 high resolution images: maps, town plans and topographical views from the 16th to the early 19th century". So, if this declaration can be taken in face value, there is no problem with copyright because this maps are already in Public domain and being a 2D works their digital copies are also in PD. So if this statements are correct all their collection could uploaded by a bot to commons. Their maps are avaible in high resolution using zoomify (see the exemple map in their site). Tm (talk) 13:20, 22 April 2009 (UTC)
[edit] Opinions
- Looks like a great collection. Is it possible to access the source files? Did you try contacting them? Multichill (talk) 14:03, 7 June 2009 (UTC)
Sorry for the delayed answer. To aswer your first question, i don´t know if it´s possible to have online acess to their source files, and i am not very techie savy. Also i didn´t try to contact them. What is your opinion of what are the next steps to take? Tm (talk) 01:25, 15 June 2009 (UTC)
- I´ve sent today an email asking for their permission to make this batch upload. I thought that asking now if their source files are avaible online in this stage would be too soon. Tm (talk) 15:10, 2 July 2009 (UTC)
- Sorry about not responding sooner, looks like i forgot to watchlist this page. We're in the non tech phase. Try to contact them, see if they like it. If that turns out alright we can start the actual data retrieval and uploading part. Writing a general story about this is still on my list. I'll see if I can make a first version. Multichill (talk) 16:59, 2 July 2009 (UTC)
Just a quick update to tell that i received a automatic answer about the absence of the person contacted by my email, and i forward it to a email i received in the answer. When and if i receive a answer i´ll update this page. Tm (talk) 00:48, 3 July 2009 (UTC)
I received a aswer, and already replied to it, but i am waiting permission to republish the email or the contents of the aswer that i received. Tm (talk) 04:10, 12 July 2009 (UTC)
- You can always use OTRS if you want to keep it private. Multichill (talk) 10:56, 12 July 2009 (UTC)
The question isn’t exactly about privacy, but more about building trust between the parts, after the NPG case (I fully support Dcoetzee), with might have been heard by this people and gave them a bad impression of Wikimedia Commons and its users. I can tell, without breaking the secrecy correspondence, that the answer that I received was slightly positive to the possibility of cooperation, but the person that answered made some questions, doubts and remarks that need to be addressed, about this possible cooperation, (I gave my opinion), but requested that its answer be publish so that more people can give their input. Despite this I received an automatic answer to my second email telling that I might not receive a second email until 10 of August. Tm (talk) 07:39, 19 July 2009 (UTC)
- Any update on this one? Multichill (talk) 23:13, 4 September 2009 (UTC)
Not much. I´ve received a email on 11 of August telling, that do to the holidays of the person that i´ve send the mail, the answer would be delayed but i´ve not received nothing subsequently, until now. Tm (talk) 23:43, 4 September 2009 (UTC)
- I have send an email today. as i´ve only received a email on 15 of September telling me that the person i contacted had contacted the library but was still waiting an answer. In this email i asked if there is already an answer. When i receive a answer i´ll update this page. Tm (talk) 04:05, 21 November 2009 (UTC)
- These images are easily scrapable through a bit of regex and looping. The various galleries are listed here Where each gallery has about 40 images of the same subject, different periods probably. Next to each gallery the name of the place is listed, where the category could just be like Category:Scotland maps or the like. We've done uploads through the Zoomify upload before so the experience is there.--Diaa abdelmoneim (talk) 10:07, 17 October 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Freshwater and Marine Image Bank
The Freshwater and Marine Image Bank from the Digital Collections at the University of Washington states:
- "Materials in the Freshwater and Marine Image Bank are in the public domain. No copyright permissions are needed. Acknowledgement of the Freshwater and Marine Image Bank as a source for borrowed images is requested."
The entire library can be browsed here: [5]
These photos would be useful in the many marine and freshwater life articles of the Wikipedias. The images are encyclopedic and are very high quality.
The digital collection has been "closed" since June, but the site is still accessible. My guess is the site will shut down within a few days (whenever their webspace subscription ends).
Any way someone could set up a batch for this? Thanks, Bob the Wikipedian (talk) 19:46, 13 July 2009 (UTC)
[edit] Opinions
Um...no responses yet? Perhaps I should revisit the fact this database isn't supposed to be up much longer. Either we take the images now or they might not be there a few months from now. Bob the Wikipedian (talk) 01:23, 28 July 2009 (UTC)
- I would like to echo the great potential utility of the UW image database! In many of the subjects of particular interest to me (e.g. North Pacific marine ecology, marine mammals, Pacific salmon, sturgeon species, indigenous people) the collection is a real goldmine. Somebody, whoever is out here making such magical batch uploads possible, please respond! Best, Eliezg (talk) 21:13, 9 August 2009 (UTC)
- Simply looping from http://content.lib.washington.edu/cdm4/item_viewer.php?CISOROOT=/fishimages&CISOPTR=32550 to http://content.lib.washington.edu/cdm4/item_viewer.php?CISOROOT=/fishimages&CISOPTR=53764 gives you all the required data for the collections. I don't know however how to extract the images themselves.
- Ok I found out a way to do so, using getimage.exe ... just use this code: http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=52164&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 while looping "CISOPTR" number. So the images are from http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=32550&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 to http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=53764&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 with the different metadata grabbable as described above...--Diaa abdelmoneim (talk) 16:45, 17 October 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Travelers in the Middle East Archive
The Travelers in the Middle East Archive (TIMEA) describes itself as "a digital archive that focuses on Western interactions with the Middle East, particularly travels to Egypt during the nineteenth and early twentieth centuries." It is supported by the Institute of Museum and Library Services and Rice University. The website provides very detailed information about the authorship and publication history of each image. The images and photos are of good quality and great historical value. Most of them are scanned from pre-1923 U.S. books, so uploading them to Commons should be OK. In any case, all of the website's content is licensed under a Creative Commons Attribution 2.5 License, so this would be perfect for a batch upload. I already created a corresponding category and source template. --BomBom (talk) 00:04, 8 August 2009 (UTC)
[edit] Opinions
Support Yann (talk) 10:01, 8 August 2009 (UTC)- Looks like a very nice resource! Did you try contacting them? It's always very nice to do these kind of uploads in cooperation with the donating party. Oh, by the way. I think the name of the category should be Category:Images from the Travelers in the Middle East Archive to clearly indicate this is a source category. Multichill (talk) 10:28, 8 August 2009 (UTC)
-
- I'm willing to write them an e-mail. But what exactly should I ask for? The CC license basically means that we don't need their permission. Maybe you have some "technical" questions (related to the uploading process) in mind I should ask them? --BomBom (talk) 21:40, 9 August 2009 (UTC)
- Update I renamed the category as requested. --BomBom (talk) 01:39, 11 August 2009 (UTC)
- You could tell something about Commons and how we do partnerships here (that page needs updating by the way). The main thing is metadata in an easy format (database, dump, xml, etc) for easy processing. Multichill (talk) 12:34, 17 September 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Images of erosion
There are several hundreds of images of land erosion available at http://picasaweb.google.com/VolkerPrasuhn. The license is CC-BY-SA (see also media release in German or French). IMHO they are in scope and Category:Erosion could benefit from these images. --Leyo 12:18, 16 September 2009 (UTC)
I got the images on DVD and started to upload them manually. See Category:Images by Volker Prasuhn. --Leyo 20:32, 28 October 2009 (UTC)
- Upload completed. --Leyo 21:30, 6 November 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] livepict.com
This is a website avaible in http://livepict.com/, has sets of concerts that are licensed with a CC-BY-SA 3.0 license. There are several (might be a few hundreds) of its photos, in high resolution (link to the license in their site, in spanish), in the total. They are divided per band, per set. I already uploaded several sets just as this image (Alanis Morrisete; Cat Power; Katie Melua; Paulina Rubio and Portishead). They also have a flickr account where some of their photos are avaible in lower resolution, refering where the concerts where and with a CC-BY-SA 2.0 license. So i think this might be a straight forward batch upload.
[edit] Opinions
Nice opendir at http://livepict.com/large/. I will have a look at it. Multichill (talk) 12:52, 17 September 2009 (UTC)
Done Uploaded 906 images, assigned to respective groups categories, created some new. Output can be be found here. --Justass (talk) 13:36, 11 November 2009 (UTC)
- Looks good. Could you please describe how you've done it and maybe publish some code you used so other users can learn from this? Thank you, Multichill (talk) 13:55, 11 November 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
| Justass | Done |
[edit] Zorger
Message below was posted on the Commons:Village pump --Jarekt (talk) 19:23, 18 September 2009 (UTC)
- Looks like a batch upload could be useful here: public-domain.zorger.com. Tekstman (talk) 18:03, 18 September 2009 (UTC)
I browsed the site and they seem to have few hundred images scaned from old books with clear sources and their own PD justification. Some of those images might be useful, like those. Some should match them to our PD licenses. --Jarekt (talk) 19:23, 18 September 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Mollusca by Jan Delsing
Photos of shells of Mollusca (143 bivalves, 1469 gastropods) by Jan Delsing from http://www.biolib.cz/en/galleryuser/?uid=3973
The only uploaded example is: http://commons.wikimedia.org/wiki/File:Pythia_scarabaeus_shell.jpg
The best names of files would be: BINOMIAL NAME shell.jpg
Example of filenames:
- File:Pythia scarabaeus shell.jpg
- File:Pythia scarabaeus shell 2.jpg
- File:Pythia scarabaeus shell 3.jpg
- File:Pythia scarabaeus shell 4.jpg
- and so on.
Thanks. --Snek01 (talk) 18:09, 6 October 2009 (UTC)
- If this information could help, then EOL has cooperation with biolib.cz and EOL takes public domain images and Creative Commons images from this source automatically. --Snek01 (talk) 10:18, 12 November 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Nasa Technical Reports Server (NTRS)
Nasa's NTRS contains over 1 million records including over 40,000 pdfs; tens of thouasands of images, and tens of thousands of videos. Most of these are items are in public domain as they are a work of nasa.
For an example of pdfs containing useful skematics, images, and diagrams see...
In addition to pdfs...they also have tens of thousands of images and videos.
Perhaps NASA could be contacted requesting a dump of sorts of their images...but at a million records, many of which are copyright free...there's a trove of free media.
- So does anyone have any opinions?Smallman12q (talk) 23:31, 13 October 2009 (UTC)
- Anyone...Smallman12q (talk) 22:44, 19 October 2009 (UTC)
- So I take it that tens of thousands of high quality diagrams and photographs of spacecraft simply aren't of interest?Smallman12q (talk) 14:30, 31 October 2009 (UTC)
- Anyone...Smallman12q (talk) 22:44, 19 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Virtual Manuscript Library of Switzerland
Scans of manuscripts from the Virtual Manuscript Library of Switzerland. At this date, there are 482 manuscripts from 20 different libraries: http://www.e-codices.unifr.ch/en
Usual copyfraud restrictions included... :(
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
[edit] Flickr
The process of uploading from flickr is significantly easier now due to Flickrripper developed by Multichill. Any photoset, group, tag or userstream that comply with Commons guidelines can be posted here and will be dealt with by a dedicated uploader. A global category should be specified where all uploaded files of a certain set should be added.
[edit] Beyonce I Am… Tour
Images of Beyonce I Am… Tour from flickr. There are some sets already freely licensed... I'll try to get more released and list them here. The images should be placed in Category:I Am… Tour and subcategorized with the different venues in them.
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
| Category:I Am… Tour |
[edit] 2009 Venice Film Festival
All the images in the 2009 Venice Film Festival collection by nicogenin who has actually a lot more celebrity photos. These are about 300 all of which are celebrities and licensed as cc-by-sa-2.0 ...--Diaa abdelmoneim (talk) 21:28, 17 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
[edit] World Economic Forum
All the various sets from Photostream. About 3,000 images of various leaders and celebrities. The images would fill the various subcategories on Category:World_Economic_Forum. We have {{WEF}} which auto categorizes in Category:Images from the World Economic Forum with a confirmed otrs ticket. So instead of flickrreview use {{WEF}} or {{WEF}}.--Diaa abdelmoneim (talk) 21:10, 17 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name | Category |
|---|---|---|---|
[edit] Moses Namkung
Moses Namkung, photograph that has a lot of images under licence CC-BY version 2.0.
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Rubenstein All-Star photos
Photos taken by Rubenstein house photographer Martyna Borkowski of the 2008 All Star Game Red Carpet Parade, which can be found at http://www.flickr.com/photos/rubenstein_/sets/72157606239259761/ Set contains a number of past and present all-star baseball players - e.g., Rollie Fingers, Tommy Lasorda, Ichiro Suzuki, A-Rod,
Note I've already uploaded a number of these photos have already. Also, not every photo has an identification of who it is of it, but a good source of that is the Getty Images set of the same event (here and here). Tabercil (talk) 03:29, 1 October 2009 (UTC)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] Cyclonebill Flickr food images
Upload of as many as possible of Cyclonebill's excellent CC-BY-SA photos of food. User page: [6] One interesting set: [7] (not all of his photos are in sets - is easy bot upload still possible?)
[edit] Opinions
Great set of images. Upload them using -user_id:.... Multichill (talk) 09:34, 30 September 2009 (UTC)
- Does someone want to actually do this? After manually uploading a single of his images to use it in a Wikipedia article, I contacted "cyclonebill". He wrote me back that be would happy to see his photos here. Nillerdk (talk) 15:08, 15 October 2009 (UTC)
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] 2009 US Open
Very interesting tennis pictures of the 2009 US Open tennis. The images should be uploaded to Category:2009 US Open (tennis) with the respective Player's category. The following sets have a free license and can be imported to commons. The numbers indicated the number of photos in that set:
- charliecowins (260)
- terryballard (12)
- emmett.hume (19)
- Christian Mesiano (56)
[edit] Opinions
| Assigned to | Progress | Bot name |
|---|---|---|
[edit] TIFF09
Images on flickr of the 2009 Toronto International Film Festival. The images should be uploaded to Category:2009 Toronto International Film Festival with the respective Artist's category. The following sets have a free license and can be imported to commons. The numbers indicated the number of photos in that set:
- mrvmedia (100) Done
- Sasoriza (69) Done
- czstrova (60)
- bucajack (17)
- Josh Jensen (60)
- Tsar Kasim (32)
[edit] Opinions
Sets don't have a standard way of taging with many short tags that don't qualify for categories, I therefore created Category:2009 Toronto International Film Festival (uncategorized) and will categorize from that.--Diaa abdelmoneim (talk) 19:42, 3 October 2009 (UTC)
| Assigned to | Progress |
|---|---|
| User:Diaa abdelmoneim | uploading |
