User:Geograph Update Bot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Bot
This user account is a pywiki-based bot operated by bjh21 (talk). It is flagged as bot. It is not a sock puppet, but rather an automated or semi-automated account for making repetitive edits that would be extremely tedious to do manually.
Administrators: if this bot is malfunctioning or causing harm, please block it.

taskscontribscountlogspage moves block user block logflag logglobal contribsflag bot

Botboxes
Gnome-emblem-photos.svgThis user has uploaded images to Wikimedia Commons.
Gnome-globe.svg

To contact the operator, use his talk page.

This bot's general purpose is to operate on images from Geograph Britain and Ireland that are already on Commons, improving them by copying information from Geograph as appropriate.

It curently has three permitted tasks:

  • To replace low-resolution images from Geograph Britain and Ireland with higher-resolution ones. Many pictures from Geograph on Commons are only 640px across, but higher resolutions are often available. The bot will only upload a new image if the latest version on Commons is identical to the 640px version on Geograph. (permission)
  • Adding, amending, and removing geocoding templates on files from Geograph projects. (permission)
  • Adding credit lines to files from Geograph projects. (permission).


The bot's code is available on GitHub.

Resolution improvement[edit]

This task is currently being run occasionally when it seems like a good idea.

Background[edit]

In the beginning, Geograph only stored images up to 640px in each dimension. The Geograph software would downscale (initially badly) any larger image that was uploaded. Geograph now permits larger uploads, including uploading larger versions of existing images. The 640px image is still special, though, in that it's stored separately and is immutable, and it's what's displayed on the Geograph Web site.

Because of this, many images from Geograph had a higher-resolution version available on Geograph than was on Commons. While 40% of the images on geograph have versions over 640px, before Geograph Update Bot started upgrading them fewer than 5% of the images from Geograph on Commons were that large. This is largely because Commons is biased towards earlier Geograph images, but even allowing for that it seemed that there might be 50,000 images on Commons that could be improved by uploading a higher-resolution version from Geograph in accordance with the guideline on overwriting existing files. The Geograph Update Bot's first task was to upload those higher-resolution versions.

Results[edit]

On its first pass through the database, the bot uploaded 8,910 images, with another 420 found on a second pass (largely from added {{geograph}} templates). That's rather fewer than expected, but still a worthwhile effort. The bot will continue to be run occasionally to pick up newly-uploaded images.

Method[edit]

The bot extracts the Geograph ID for an image from its {{Geograph}} template. It checks the image dimensions against a dump of the gridimage_size table, and copies across the Geograph full-resolution image if all of these criteria are fulfilled:

  • No version of the image has been uploaded by User:Geograph Update Bot already (since )
  • There is precisely one {{Geograph}} template in the image description.
  • The {{Geograph}} template contains a valid Geograph ID.
  • That image has a high-resolution version on Geograph.
  • The aspect ratios of the old and new images agree within 1% or are inverses within 1% (since ).
  • The current image on Commons has the same dimensions as the 640px image on Geograph.
  • The current image on Commons has the same SHA-1 as the 640px image on Geograph.
  • The attribution specified by the {{Geograph}} template is the same as the attribution specified on Geograph.
  • If the image on Commons has a {{Credit line}}, it specifies the same title as the image has on Geograph.
  • No warnings (apart from overwriting an existing file) are generated by the upload.

It then compares the new and old 120px thumbnails. If they differ by more than a specified amount, it adds the file to Category:Dubious uploads by Geograph Update Bot for human attention.

Location correction[edit]

This task is in abeyance while Bjh21 writes more code. Most uploads by GeographBot have had their location corrected.

Background[edit]

All photographs on Geographs are geolocated to some extent. Every photo has a subject location recorded, and 95% have a camera location as well. Locations in Great Britain and the Isle of Man are recorded in the British National Grid, which locations in Ireland use the Irish Grid. There are also WGS84 geodetic co-ordinates in the Geograph database, and these (roughly) correspond with the subject location, but these are not actually used by Geograph.

When GeographBot imported 1.7 M images from Geograph, it generated {{location}} templates from the WGS84 columns of the Geograph database. Since {{location}} is meant to contain the camera location, this meant that a lot of locations were incorrect.

Method[edit]

The bot operates on files uploaded by GeographBot. It constructs a new {{Object location}} template based on the subject location from the Geograph database. If a viewpoint location is recorded (or implied), it also constructs a {{Location}} template, but only if the new camera location has a precision better than 1km.

The existing {{Location dec}} template is removed, and replaced by the new {{Location}}, if any, if all of the following conditions are met:

  • The file was originally uploaded by GeographBot
  • The current {{Location dec}} template is identical to that in the first revision

The new {{Object location}} is added if the file doesn't already have an {{Object location}} and either new object location has a precision better than 1km or there is no camera location.

Before 22:30Z, the bot didn't treat 1km precision specially.

The update is flagged as a minor edit if it is only replacing {{Location dec}} with {{Location}} and the locations differ by less than the grid-reference precision.

Sample templates[edit]

File:Woodchester Mansion - geograph.org.uk - 4.jpg
8-figure subject and camera references; use6fig set
{{Location|51.71051|-2.2766|source:geograph-osgb36(SO80980134)_heading:292|prec=100}}
{{Object location|51.71069|-2.2773|source:geograph-osgb36(SO80930136)_heading:292|prec=100}}
File:Lake at Woodchester Park - geograph.org.uk - 5.jpg
4-figure subject reference only, but moderated as Geograph
{{Location|51.712|-2.25|source:geograph-osgb36(SO8201)|prec=1000}}
{{Object location|51.712|-2.25|source:geograph-osgb36(SO8201)|prec=1000}}
Would have had the camera location removed after 22:30Z.
File:Raised shoreline and creep terracettes - geograph.org.uk - 1803781.jpg
4-figure subject and camera references
{{Location|55.174|-4.93|source:geograph-osgb36(NX1390)_heading:225|prec=1000}}
{{Object location|55.174|-4.93|source:geograph-osgb36(NX1390)_heading:225|prec=1000}}
Would have had the camera location removed after 22:30Z.
File:Ogham stones near Baile Mhic Íre (Ballymakeery) - geograph.org.uk - 2913.jpg
6-figure subject reference only; Ireland
{{Location|51.935|-9.16|source:geograph-irishgrid(W2076)|prec=1000}}
{{Object location|51.9360|-9.152|source:geograph-irishgrid(W208765)|prec=100}}
Would have had the camera location removed after 22:30Z.
File:Captain's Pool - geograph.org.uk - 715.jpg
10-figure camera reference; 4-figure subject reference; no view direction
{{Location|52.372194|-2.22568|source:geograph-osgb36(SO8473274929)|prec=1}}
{{Object location|52.368|-2.23|source:geograph-osgb36(SO8474)|prec=1000}}
Would not have had object location added after 22:30Z.
File:Fossilised tree stumps near Lulworth Cove - geograph.org.uk - 15.jpg
4-figure subject reference only, moderated as supplemental
{{Object location|50.615|-2.23|source:geograph-osgb36(SY8379)|prec=1000}}
File:Clifton Road Bridge, Clifton - Brighouse - geograph.org.uk - 190630.jpg
Geograph has no camera location recorded.
{{Object location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:180|prec=100}}
File:The Stotts Arms updated, Wakefield Road, Brighouse - geograph.org.uk - 924556.jpg
Geograph displays the camera as SE 149 229. Internally, it was coded as the 6 figure grid ref SE 149 229.
{{Location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:315|prec=100}}
{{Object location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:315|prec=100}}
File:J W Lister Ltd Wireworkers - Clifton Road - geograph.org.uk - 802290.jpg
Geograph displays the camera as SE 149 229. Internally, it was coded as the 8 figure grid ref SE14942298, but with a command to drop to a 6 figure location on display.
{{Location|53.70309|-1.7751|source:geograph-osgb36(SE14942298)_heading:292|prec=100}}
{{Object location|53.70318|-1.7756|source:geograph-osgb36(SE14912299)_heading:292|prec=100}}
http://www.geograph.org.uk/photo/3419425
has not been uploaded to Commons, but Geograph displays the camera as SE 1493 2291. It was coded as a 8 figure grid ref
{{Location|53.70246|-1.7753|source:geograph-osgb36(SE14932291)_heading:0|prec=10}}
{{Object location|53.70255|-1.7753|source:geograph-osgb36(SE14932292)_heading:0|prec=10}}

Adding locations[edit]

This task has added locations to almost all of the few thousand Geograph images that lacked them.

Background[edit]

Some images from Geograph on Commons are lacking any co-ordinates at all. This task is simply to add them.

Method[edit]

The bot generates new {{Location}} and/or {{Object location}} templates as described above. It doesn't add {{Location}} templates with 1km precision.

Tagging locations[edit]

To simplify future work, the bot can add source parameters to {{Location dec}} templates that lack them where those locations came from Geograph.

Background[edit]

Geocoding templates like {{Location}} and {{Object location}} can be marked with a source parameter to indicate where the co-ordinates came from. Co-ordinates generated by Geograph since 2018, and those added by Geograph Update Bot are already tagged with source:geograph, but there are many templates that are derived from Geograph co-ordinates but not tagged. If they were tagged, this would help the bot to recognise them in future as being eligible to be updated from Geograph.

Method[edit]

The bot currently only edits a {{Location dec}} template if it is identical to that on the first revision of the file description page, and that first revision was generated by georgraph2commons. For the current OAuth-based geograph2commons, that's identified from its edit summary. The older version is assumed to be responsible for all Geograph uploads by File Upload Bot (Magnus Manske).

Credit lines[edit]

Most files from Geograph Britain and Ireland now have credit lines.

Background[edit]

Geograph pictures have titles, and CC BY-SA 2.0 requires that these titles be conveyed along with them. Most Geograph pictures on Commons have their titles in their descriptions or filenames, but these sometimes get edited and there's no indication that they need to be preserved when the pictures are re-used. On Commons, the {{Credit line}} template is used to store information like this that's required to be kept with a work.

Method[edit]

The bot skips any file that has already has a {{Credit line}} template on the assumption that it's correct. It only works on images with a {{Geograph}} template, since the licensing arrangements for {{also geograph}} images may be different. It only operates on images where the author name in the Geograph database matches that in the {{Geograph}} template, since otherwise there would be a danger of ending up with a title/author combination that had never actually been licensed. If all the preconditions are met then the bot adds a {{Credit line}} template to the other fields parameter of {{Information}}, adding the parameter if necessary.

The bot only makes these changes where it believes that the title on Geograph has not changed since the file was uploaded to Commons. This is possible in two cases. First, if the file was uploaded by GeographBot and the description provided by GeographBot is consistent with the current title. Second, if the last-modified timestamp on Geograph is earlier than the first upload to Commons.