Commons talk:Batch uploading

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Things to add to this page[edit]

  • General flow of batch uploads:
    1. Find a source of free images
    2. Get this images and process the metadata to produce wikicode for Commons
    3. Upload the images
      1. By someone with an upload bot
      2. By one of the shells using import
  • Sample letters to send to potential image donors

(... please feel free to add things, it's just a list i won't forget things) Multichill (talk) 15:10, 7 June 2009 (UTC)

Hello?[edit]

Hello? Does anyone participating in this project watch this page? I would like to point out that a particular batch involving a University of Washington image bank requested a particular urgency...and I am surprised to see the image bank still exists, since there is a note on the site stating it was supposed to close permanently several months ago. Bob the Wikipedian (talk) 05:40, 19 September 2009 (UTC)

Well there were alot of things going on in the past few days. A batch upload of Wiki loves Art Netherland was started, Images from Troppenmuseum were imported and a tool for batch uploading from flickr was created. User:Multichill is currently the only one who is working on batch uploads, User:Dcoetzee was working on many uploads but since the NPG vs Dcoetzee case he has reduced his batch uploads. If you could contact the university and get the information database of these files and the list of file links, it would ease the upload and it would be delt with right away.--Diaa abdelmoneim (talk) 09:08, 19 September 2009 (UTC)

US federal goverment sites[edit]

I already added a couple of requests. air force and coast guard should be added too. Probably more nicely structured sites out there which can be copied to Commons. Multichill (talk) 22:47, 14 October 2009 (UTC)

Single air force images can be pulled from http://www.af.mil/photos/media_view.asp?id=314289, the id suggests the site contains a lot of images. Multichill (talk) 16:24, 23 October 2009 (UTC)
If possible, USFWS and NOAA would be nice to have. -- User:Docu at 18:14, 23 October 2009 (UTC)
How does the upload bot check if the image allready exists in Commons? Does it just check the VIRIN number or also other file characteristics like URL, file description or even more? --Zaccarias (talk) 21:01, 27 January 2010 (UTC)
It checks if a file with the same SHA1 hash already exists. Multichill (talk) 21:04, 27 January 2010 (UTC)

So an identical file would be found and the bot would not upload the image? The reason why I am concerned is because there a large number of US government sites are existing and some have the same pictures but sometimes they seem to have a different resolution.

I have worked on the images from the US Navy about Afghanistan and Pakistan and I have found a number of duplicates, not many but some. I think that we would end up with thousands of duplicates. In some cases I think it's just not possible to find still existing duplicates, but in many cases they could be avoided.

Examples:

The problem is that there are so many different websites. Due to this duplicates are quite likely to occur. I just think that everything which could be easily avoided should be taken care of. Maybe a bot could grab the Picture ID's somewhow and insert the {{ID-USMil}}-Templates. --Zaccarias (talk) 21:53, 27 January 2010 (UTC)

Swedish National Heritage Board asking about upload methods[edit]

I've been in touch with the Swedish National Heritage Board about the possibilities of donations. They have previous made a minor upload on Flickr Commons[1] and have announced it on there Swedish site here. Sophie Jonasson, the person in charge of the project is interested, but needs to convince her superiors and wants to know how much work is needed for uploads. I said that we had various scripting possibilities, but I said I'd ask about the details. Can someone expand on the methods for making group uploads, large or small? An especially important aspect is how to include meta-info.

Peter Isotalo 13:12, 29 January 2010 (UTC)

The Nordiska Museet upload would probably be the best example of something similar. Although it might be best to wait until those images are actually uploaded before using that as an example. Assuming they've already sorted the possible legalities (i.e. make sure they can actually release the rights to the images) I'm guessing the amount of work needed would probably depend largely on how well (or like Commons) organised their metadata is. At least that seems to be where the most work is spent on the Nordiska Museet images. /Lokal_Profil 15:05, 7 February 2011 (UTC)

Batch from Picasa or Panoramio ?[edit]

Hello,

A fellow French user is currently working on a project which would imply many pictures (hopefully :-), a staging area, then mass-upload to Commons.

At this moment, he is assessing which platform would be the best for the purpose of the staging area, and he is considering Commons itself, Flickr, Picasa and Panoramio. He asked on the French Village Pump about bot-upload from those websites.

Here is his question : is it possible/easy to perform a batch-upload from Picasa or Panoramio (given correct licensing of course) ? The technical feasibility and easiness would be a major criterion in the choice of the platform.

Cheers, Jean-Fred (talk) 15:45, 23 March 2010 (UTC)

With what purpose he is considering Flickr, Picasa and Panoramio? Those projects seem to be easy in use when individual images are uploaded with a project like Commons:Wiki Loves Art Netherlands. But with a large upload, I would directly upload at Commons. But I think you or other users may ask for example User:Multichill for help/comment, as he does do such things very often. Greetings - Romaine (talk) 19:29, 23 March 2010 (UTC)

Maps[edit]

Hey, do you know about this maps page http://english.freemap.jp/ ? emijrp (talk) 20:29, 13 June 2010 (UTC)

Commons:IUCN red list[edit]

Hi. We will be producing hundreds of maps (hopefully thousands) pretty soon to upload to commons, based on that partnership. I would like to request some assistance and advice from this project. How would the uploading work? Cheers, GoEThe (talk) 11:05, 18 June 2010 (UTC)

Depends a bit on where the files are located (somewhere in the web, on your local disk, ... ?), and on the size of the files. If you e.g. have them on your disk, and they are 1 to 4 GB in total, you could burn a DVD from them and send it to me, and I can then do the upload. If they are on the web, they would have to be downloaded first... --Reinhard Kraasch (talk) 14:24, 23 June 2010 (UTC)
Probably best to create a new subpage to discus this. Multichill (talk) 14:55, 25 June 2010 (UTC)
I've done that at Commons:Batch uploading/IUCN red list. GoEThe (talk) 14:18, 6 October 2010 (UTC)

88gb of Public Health imagery[edit]

Hi all. I crawled the public health image library from the CDC a few months back: [2] . It's about 88gb of high resolution, print quality .tiffs. I'm setting up a mirror of the content and I would also like to upload the content to wikicommons. I have descriptions and some interesting MeSH metadata, but few titles in a local db. Would someone like to drop me a line and help me script an upload script? I think that the MeSH categorization data is particularly useful and I would hate to see it be separated from the images.

Sethwoodworth (talk) 03:17, 12 July 2010 (UTC)

Navy manuals[edit]

There is a series of manuals of the US Navy available on the internet, e.g. at

www.hnsa.org/doc/

These includes drawings and schemes on various parts of ships. After an initial batch, images would need to be extracted from the publications. If the PD status of these is ok, these could easily fit in to the current somewhat too similar collection of US Navy media. Even if they are somewhat dated, I think they would be a good addition.  Docu  at 05:44, 6 October 2010 (UTC)

Advertising[edit]

Past batch uploads section would be useful in convincing others to donate images to us, but it is missing a key component: links to media discussing them. Telling others that if they donate to Commons they will get free advertising in media through news articles is helpful, this would be a good place to show them some proof. --Piotr Konieczny aka Prokonsul Piotrus Talk 13:01, 22 December 2010 (UTC)

This is more about the technical side. You probably want to take a look at Commons:Partnerships. Multichill (talk) 16:35, 22 December 2010 (UTC)

Book-pages[edit]

I intend to upload pages from a book. It's plain text, no illustrations, no pictures.

The first pages are to be found in Category:Gamla Testamentet (Myrberg). - There is in total 314 pages. These files will be used on Wikisource.

I have designed some software in C# based on the DotNetWikiBot Framework. The code can be found here.

Do you have any objection about the use of such software for uploading files? (I use similair code to upload text from Finereader to Wikisource.) -- Lavallen (talk) 17:10, 10 February 2011 (UTC)

Updating this page[edit]

The Commons:Batch uploading page has numerous old requests and in-progress requests, many that are GLAM-related. The status of many is unknown. I would like to work on getting this updated and improve the process for requests. Right now, as a new/aspiring batch uploader, it's difficult to know where to begin. If anyone wants to help, that would be awesome, or otherwise I'll try poking people. -Aude (talk | contribs) 20:18, 9 March 2011 (UTC)

The Prado in Google Earth[edit]

FYI, I haven't listed it on this page but I'm currently uploading a small set of very high-resolution works from the Prado in Google Earth project, to be placed in Category:Prado in Google Earth. There were a couple there already, including one featured picture, but they aren't nearly as high res as they could be. Dcoetzee (talk) 07:06, 13 May 2011 (UTC)

This is done. Dcoetzee (talk) 19:28, 23 May 2011 (UTC)

C2RMF[edit]

I'm currently in the process of uploading a set of 22 high-resolution (about 100-300 megapixel) works from C2RMF, from French museums. These include the famous Mona Lisa. These will go in Category:High-resolution images from C2RMF. Dcoetzee (talk) 21:16, 5 June 2011 (UTC)

This is also done. Dcoetzee (talk) 21:28, 26 September 2012 (UTC)

New batch upload in progress[edit]

I'm doing a new batch upload at the moment, in the downloading stage right now. I'm avoiding discussing it in public but please e-mail me if you want more information. Thanks! Dcoetzee (talk) 21:30, 26 September 2012 (UTC)

Now announced at Commons:Village pump#New_Google_Art_Project_uploads_have_begun. Dcoetzee (talk) 06:11, 30 September 2012 (UTC)

Listing criteria (e.g. minimum number of files)[edit]

I've just created a new request, but it's for around only 20 files. I couldn't find any reference here to a minimum number of files - perhaps I should've looked somewhere else. Guidance appreciated. Thanks. -- Trevj (talk) 12:17, 9 January 2013 (UTC)

Purpose of this page[edit]

Hi all. Just wanted a clarification. Is this page/project intended primarily for people looking for bot operators to help them or is it also intended for discussions about formatting, problems, recommendations related to batch uploads which already have a bot operator. In other words if I'm intending to do a batch upload myself should I start a post here or should the whole thing be contained to Commons:Bots/Requests? Cheers /André Costa (WMSE) (talk) 15:52, 5 February 2013 (UTC)

Both. If you plan to do a batch upload, please open a page here so we can discuss it :). Jean-Fred (talk) 16:43, 5 February 2013 (UTC)
That's what we did last time but reading the intro now I suddenly became unsure. Will write an entry later today or tomorrow. /André Costa (WMSE) (talk) 22:12, 5 February 2013 (UTC)
Done. By the way I've put a question about Chunked uploads on the subpage as well, any help is welcome. /André Costa (WMSE) (talk) 09:33, 7 February 2013 (UTC)

RFC: Minimum requirements for categorization[edit]

I have created Commons:Requests for comment/Batch categorization requirements to gain a community consensus on guidance for batch uploaders as to use of backlog categories. This is a frequent complaint about batch uploads, probably as folks don't appreciate how working with cooperative teams means that resolving a batch upload backlog may take many months. If the community is against this approach, then Commons may fail to preserve large batch uploads where categorization is not obvious from the source metadata.

Opinions from experienced batch uploaders to the RFC would be highly welcome. Thanks -- (talk) 10:01, 29 September 2013 (UTC)

12,000 Dutch colonial maps now online from Indonesia, Dutch Antilles and Surinam[edit]

This may be of interest: https://twitter.com/bl_eap/status/454263083012087808/photo/1. See also http://www.library.leiden.edu/special-collections/maps/introduction-maps.html. — SMUconlaw (talk) 16:02, 10 April 2014 (UTC)

Cleaning up[edit]

This page has become pretty unwieldily. Maybe time to clean up, and move stuff to subpages? Huskyoog.jpg Husky (talk to me) 07:56, 13 May 2014 (UTC)

Opinions about a mass content donation offering[edit]

The Open Culture Data network in the Netherlands has offered to do an extremely large content donation. I'd like some reactions to this proposal.

Ter-burg (talk) 08:28, 20 May 2014 (UTC)