Commons:Batch uploading/Airliners

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Warning Due to Office action by unelected Wikimedia Foundation employees which cannot be appealed, nor the evidence examined by those found guilty in secret of something unspecified using unverifiable information, this highly successful content generating project which exceeded its targets, has now been abandoned. If as an unpaid volunteer, you wish to start a similar project, please create a new batch upload project page. Thank you (talk) 14:24, 11 March 2015 (UTC)[reply]
Aim: To upload 100,000 identified aviation photographs by amateur and professional photographers.
Exemplar photo: Italian military MB-339's flying in formation at Brindisi Papola Casale, Italy, by Aldo Bidini

Description

[edit]

These are amateur collections of photos posted to aviation websites. The fits the aim of Commons to preserve a comprehensive collection of photographs against every in-service aircraft type for the purposes of educational re-use. This is a large project and is planned to span many months, mainly throughout 2013 but is likely to extend into 2014.

  • Which license tag(s) should be applied?

The photographs are not all available on a free release, individual OTRS tickets are needed to release these. Those already available were released on {{GFDL}}. Credit templates with the licence are individually created once the OTRS ticket is approved, for example {{AlanBrown}} which is now used on over 1,000 photographs.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

The standard {{Infobox aircraft image}} is applied to these batch uploads and populated as much as possible from the forum description/metadata. This includes location, aircraft id, photo description, photo date, photographer, gallery page, image source page and construction number.

Categories exist for most aircraft ids and are added automatically. Where found to be missing, these are picked up and created by volunteers manually.

  • Other conventions

Multilanguage: To avoid complex issues with filenaming and uploading, the unidecode.py module is used on the filename to find the best meaningful transliteration for accented or other characters to the standard ascii set. For example "Москва" should be transliterated as "Moscow". This only applies to the filename, text in the description fields will be identical to the source. This only applies when the source forum is predominantly in English, for sites such as http://russianplanes.net/ where a non-English language is consistently used, filenames will be based on the source language to avoid the high probability of unacceptable translation errors.

ID: The filename ends with the unique database identity of the photo, being a string of 7 digits. For the different sources the naming convention is:

Source website Filename format
Airliners.net File:<aircraft>, <airline> AN<id>.jpg
JetPhotos.net File:<aircraft>, <airline> JP<id>.jpg
Russianplanes.net File:<aircraft> <registration>, <location> RP<id>.jpg
Planepictures.net File:<aircraft> <airline> <registration>, <location> PP<id>.jpg

Note if <aircraft> looks unrealistically short, then location is used instead. Sometimes this is due to the photo being of an airport rather than an aircraft. If <airline> is unrealistically short, then it is left out as optional.

EXIF data: After the second tranche of uploads had started, it was noticed that the Python Image Library that was semi-automatically detecting and cropping the credit bar from images was dropping the EXIF data in the process. A separate routine uses exiv2 to copy EXIF data from the original file to the cropped version before upload and this same routine is being used to improve previously cropped files. Where the EXIF data is less than 3,000 bytes in size, it is skipped as trivial (when under 4,000 bytes, metadata normally appears to be that introduced by image processing rather than by the original camera); the skipped files could be retrospectively fixed if there is a later rationale put forward for doing so. Some of the variance in cropped jpeg filesize may be down to embedded thumbnails in the original being lost in the cropped file, this is an optional feature of jpeg files but makes no practical difference to files hosted on Commons.

Workflow:

  1. Negotiation with photographer on aircraft forum, resulting in an email to OTRS with a release statement and a credit template on Commons referencing the ticket number. Add to approved list of authors sorted by forum/website.
  2. Set up python code for specific forum to generate text for the image pages, as layouts will vary. This relies on BeautifulSoup.py to turn the html into Python friendly arrays.
  3. Upload sets by photographers in the backlog (organized by an off-wiki Google spreadsheet). If a category for the aircraft model exists, it is added at this point, along with any category for the photographer.
  4. Crop any credit bars and remove watermark templates. This works as a separate process and need not be run at the same time as batch upload, one benefit of keeping this a separate process is that the original watermarked version is retained on Commons should we ever need to redo the crop using better tools, or if we need to test for duplicates. The nature of cropping needed varies by forum and tests for oddities such as photos with a lot of black pixels or bleed-through. The credit bars are inconsistent as they appear to vary over time with changing styles and vary in nature between forums. Cropping relies on an installation of the Python Image Library and tests for black pixels to double check that a credit bar is there, and what height it appears to be. The cropped image is re-saved as mime type 'jpeg' with a quality of "98" which appears to create a similar sized image (a quality of "99" appears to create an image up to 50% larger than the original). There is no known way of removing overlay watermarks, images with this issue should be avoided as negotiation with a forum member might release the originals without the overlays.
  5. Final check of categories is done manually and should remove the backlog category or any category check template.
Categorization conventions
[edit]
Categorization principles
  1. Where the aircraftid/registration field matches an existing category, this is transcluded via the template.
  2. Where there is a match to the airports mapping, a default category of <year> at <airport> is to be added.
  3. Where there is no year given for the image, a default category of Aircraft at <airport> is to be used.
Categorization by airport at upload

A mapping of name variations and ICAO codes is used on upload to check the imageloc field for a search text match (or match to the airport name) and then chooses a category based on the airport name. A reasonable effort has been made to find existing Airport categories, but a lack of standards (including the majority failing to include any standard airport codes, and there being variations as to whether English or other languages are used) makes this uncertain. Ideally {{Airport codes}} should be applied to all airport categories.

For example, if a photo to upload is described as having a location that includes the precise word " LHR" or the string "London Heathrow Airport" (including the space and capitalization), then this returns "London Heathrow Airport" and the routine then adds, and if necessary creates, a year-airport category (such as Category:2012 at London Heathrow Airport), if the year is not given we then add an aircraft-at-airport category.

The year and aircraft-at-airport category choices do not apply to "non-airports" such as museums listed at the end of the table.

The category mapping table can be found on its own subpage at /Airporttable and the raw text that generates it (and that Python scripts rely on) can be found at /Airportlist.

There is an issue of encoding unicode strings in Airport categories, in that the category check functions cannot currently handle them (this might be theoretically fix-able but there is a law of diminishing returns on programmer time).

Opinions

[edit]
Issue of embossed watermarks
[edit]
Example of a photograph with an embossed watermark saying "AIRLINERS.NET" that will require a later correction.

A small percentage of images uploaded will unfortunately have "embossed" watermarks. These appear to have been placed pseudo-randomly across the image and are in addition to the "credit bar" which can be easily removed. As a practice, this seems to have stopped in recent years at both airliners.net and jetphotos.net. Though the expectation is that the number of images watermarked in this way will be below 5%, this still means that a couple of thousand may be needing correction by the end of the batch upload project. There may be ways of automatically un-watermarking images with standard overlays like this, though one has yet to be researched properly. Unfortunately there is no currently known way of automatically detecting these images on upload (for example there is no change in the EXIF information).

The general guidelines for Commons are that watermarked images assessed as in scope, and not overly promotional, are okay to be uploaded to Commons.

Action If you see any photographs with these watermarks, please add them to Category:Images from airliners.net with watermarks and Category:Images from jetphotos.net with watermarks so that the project team can apply a systematic 'best' way of correcting them.

Dashboard

[edit]

Note - this table is not maintained.

! Task Assigned to Progress Where it happens
     Ad-hoc verification of OTRS tickets User:Russavia Est. 80% Referencing OTRS emails user credit templates are created.
     Upload image sets against OTRS ticket for each photographer, coordinated via a Google spreadsheet. User:Fæ

82.5% completed (estimate)

   

Photos appear in Airliners.net photos (check needed)
     Crop credit bar from photos. User:Fæ 0 outstanding (ideally zero) Photos leave Airliners.net photos (credit bar)
     Complete categorization and remove from check category (17,504 in backlog) Multiple, see below.

40.2% completed (estimate)

   

Photos leave Airliners.net photos (check needed)
     Create Airport/museum categories where missing, and correctly template Help needed 98 red-links left (ideally zero) /Airporttable
     Create Russian parser for russianplanes.net and upload for photographers with OTRS permission Fae ✓ Done Russianplanes.net photos (check needed)
     2014-03-07: Out of 138751 files in Aviation photographs by photographer, 82507 (59%) are from this project

   

     As of 2014-03-07 the following users were spotted helping with this project:

Priority lounge

[edit]

See Commons:Batch uploading/Airliners/Priority for a list of 823 images in Airliners.net photos (check needed) that are used in at least one Wikipedia article in any language. Ideally this table should be empty as any photo in use has already been evaluated by someone, so please consider these a priority for manual checks and removal from the check needed category.

Project members

[edit]

Requests

[edit]

I'd love for these photo's from airliners.net to be uploaded to the commons so that I can use them on the Dubai International Airport wiki page:

MoHasanie (talk) 06:30, 30 March 2014 (UTC)[reply]


Hi, sorry about the delay in getting back to you. The photographers have their streams as all right reserved on airliners.net, so we need a release of their photostream on record in OTRS to be able to upload the photos. Currently, we have the following photographers from that forum with release recorded:
  1. Mike Freer - Touchdown-aviation, {{MikeFreer}}
  2. Pedro Aragão, {{PedroAragão}}
  3. Michel Gilliand, {{MichelGilliand}}
  4. Felix Goetting, {{FelixGoetting}}
  5. John Davies, {{JohnDavies}}
  6. Javier Bravo Muñoz, {{JavierBravoMuñoz}}
  7. Alan D R Brown, {{AlanBrown}}
  8. AlainDurand, {{AlainDurand}}
  9. Alex Beltyukov - RuSpotters Team, {{AlexBeltyukov}}
  10. Renato Spilimbergo Carvalho, {{RSC}}
  11. parfaits, {{PavelAdzhigildaev}}
  12. Igor Dvurekov, {{IgorDvurekov}}
  13. Steve Fitzgerald, {{SteveFitzgerald}}
  14. Toshi Aoki, {{ToshiroAoki}}
  15. Oleg V. Belyakov, {{OlegBelyakov}}
  16. Igor Bubin, {{IgorBubin}}
  17. Dmitry Avdeev, {{DmitryAvdeev}}
  18. Chris Finney, {{ChrisFinney}}
  19. Shimin Gu, {{ShiminGu}}
  20. Árpád Gordos, {{ÁrpádGordos}}
  21. André Du-pont, {{AADPR}}
  22. Guido Allieri, {{GuidoAllieri}}
  23. Anton Bannikov, {{AntonBannikov}}
  24. Manfred Groihs, {{ManfredGroihs}}
  25. Peter Bakema, {{PeterBakema}}
  26. Sunil Gupta, {{SunilGupta}}
  27. Leonid Faerberg - Russian AviaPhoto Team, {{LeonidFaerberg}}
  28. Christian Hanuise, {{ChristianHanuise}}
  29. Ward Callens, {{WardCallens}}
  30. Robert Frola, {{RobertFrola}}
  31. Darian Froese, {{DarianFroese}}
  32. Aktug Ates, {{AktugAtes}}
  33. Ian Creek, {{IanCreek}}
  34. Peter Duijnmayer, {{PeterDuijnmayer}}
  35. Danial Haghgoo, {{DanialHaghgoo}}
  36. Paul Davey, {{PaulDavey}}
  37. Martijn Geerlings, {{MartijnGeerlings}}
  38. Eugene Butler, {{EugeneButler}}
  39. Andrew Babin, {{AndreyBabin}}
  40. Dean Constantinidis, {{DeanConstantinidis}}
  41. Mikhail Glazyrin, {{MikhailGlazyrin}}
  42. Vsevolod Aladyshkin - St.Petersburg Spotters , {{VsevolodAladyshkin}}
  43. Grahame Hutchison, {{GrahameHutchison}}
  44. Ercan Karakas, {{ErcanKarakas}}
  45. Alan Lebeda, {{AlanLebeda}}
  46. Eduard Marmet, {{EduardMarmet}}
  47. Roland Nussbaumer, {{RolandNussbaumer}}
  48. Les Rickman, {{LesRickman}}
  49. Tim Rees, {{TimRees}}
  50. Raimund Stehmann, {{RaimundStehmann}}
  51. Luc Willems, {{LucWillems}}
  52. Jeroen Westram, {{JeroenWestram}}
  53. Luc Verkuringen, {{LucVerkuringen}}
  54. Elisabeth Klimesch, {{ElisabethKlimesch}}
  55. Anthony Noble, {{AnthonyNoble}}
  56. Kral Michal, {{MichalKral}}
  57. JetPix, {{TorstenMaiwald}}
  58. Perry Hoppe, {{PerryHoppe}}
  59. Andreas Hoppe, {{AndreasHoppe}}
  60. Andy kennaugh, {{AndyKennaugh}}
  61. Gleb Osokin - Russian AviaPhoto Team, {{GlebOsokin}}
  62. Ted Quackenbush, {{TedQuackenbush}}
  63. Alain Rioux, {{AlainRioux}}
  64. Andre Wadman', {{AndréWadman}}
  65. Jerome Krier', {{JérômeKrier}}
If you can find photographers that fit your needs in that list, then I can run an update from their stream, otherwise you might want to drop them a note (or ask Russavia to add them to his list) and ask if they would like to contribute to our project by releasing their photos on a CC-BY-SA license so that others can use them for the public benefit. -- (talk) 16:09, 10 April 2014 (UTC)[reply]
@MoHasanie: some of the photos you want uploaded are ok. The only ones which aren't are those by Sam Chui -- he will not release under a free licence. If you want to contact other photographers, perhaps you can co-ordinate this with me, so that I can contact them and request permission on behalf of Commons. @: I have now categorised photographers by website. It might be worthwhile getting the 1st class membership for Airliners.net now (it's $55 a year) and doing a complete run of those photographers on airliners.net. Is that all good? Also, you will need to do an update of a lot of streams, such as {{EduardMarmet}} because only a small fraction of images have been uploaded from those streams. russavia (talk) 08:13, 11 April 2014 (UTC)[reply]
Hi Russavia, so all the photo's except those by Sam Chui are alright? I can try contact Sam Chui and asking him to change the license. MoHasanie (talk) 06:40, 16 April 2014 (UTC)[reply]