User:Basvb/Ideas/Single Image Batch Upload

From Wikimedia Commons, the free media repository
Jump to: navigation, search

This is a rough idea for a new workflow to upload images from external sources. Any input on the idea of interest in working on it with me are very welcome!

In short[edit]

Shortly the idea is to provide a workflow where one can upload from external source using a single click. When an external source (mainly GLAMs) has released their images we currently see that the images are uploaded all using a batch upload, or rather sporadic. The idea here is to make this latter workflow much easier and reducing the 10s of steps to just a few. The final form could look a bit like flickrRipper (or at least have some elements in common) but then for dozens of sources.

Problem statement[edit]

A lot of external organisations have realised the positive sides of releasing their (image) material under a free license. With this large sets of images can now be used in Wikipedia and other Wikimedia projects. For these images to be used they however have to be uploaded to Commons (or the local project).

Uploading large sets by hands takes too much time thus methods have been developed to automate the uploading of large sets of images. Think of tools such as the GLAMwiki Toolset, pywikibot framework and other methods. Downside of these methods is that some technical ability and that it takes quite some time to prepare and check an upload of large set of images. Another downside is that huge set of images is uploaded which all have to be processed in some way and which have to be found back by their re-users before they can be used. Often re-users do not know that a relevant image for their article might exist already on Commons.

For some large sets of images lots of images can be not so relevant or even out of scope. Thus it could be that it is a better idea to only upload those images which are likely to be used or which somebody is specifically interested in uploading. In this latter case the person who wants to use an image from an external source which has been released currently has to download this image and fill the description and metadata of the image by hand. This takes quite some time when one already is familiar with the specific source and how to attribute it in the description. But when the uploader is not that familiar with the type of images or the source they have to first search how to correctly upload the image, find relevant categories and templates to be used. This makes this method of uploading rather difficult and time consuming, especially when one wants to upload more than a few images.


My idea would be to combine some of the benefits of both methods. This should provide a method which makes it possible to select and upload only those images which are relevant from a large collection, but makes the uploading easier, or even a one-click-solution. To achieve this the idea is to provide a framework which uploads the images. To do this the mapping from the metadata of the source has to be mapped to Commons beforehand. Templates with mapping are used to provide users with a method of uploading which takes fewer steps. In the most optimal form the external party would have a button which says: "upload this image to Wikimedia Commons" and clicking this button gives you an uploaded image which you can immediately reuse.

In final form I see a page or entrance point which shows which external parties can be uploaded from. For all of these external parties we've a template ready which can transform the parties metadata to a Commons image description. These templates are created instead of batch upload by a more technical user. On the backend something will be able to upload the images to Commons (this would be the part where I don't know how to exactly do this). On the front end users can select an external party and either provide urls of images they want to upload, or ideally click a button to upload images at the external parties website. The templates assure correct attribution to the external party, correct licenses (no incorrect own work claims on PD-images) and some basic categorisation. It is likely necessary to keep track of which images from the external source have been uploaded.


  • Only those images which somebody really wants to use are uploaded
  • Uploading the images is easy for the end user.
  • Easy access to thousands (millions?) of images without crowding Commons.


  • Templates per source have to be prepared and updated.
  • Uploaders ideally do some post processing and check the image on categories or whether something went wrong. I can see this going wrong when the method is made very easy (somebody randomly clicking the "upload to Commons"-button at the external source).