Jump to content

Commons:OpenRefine/Training 2023-24/Software development

From Wikimedia Commons, the free media repository

Software development

[edit]

The software development component of the WMF grant towards improving the Wikimedia Commons integration of OpenRefine was provided by Wikimedia Sverige (WMSE) and focused on the following tracks.

Enabling uploads of larger files

[edit]

The original implementation of Wikimedia Commons support in 2022 enabled the upload of media files and the associated upload of structured data (SDC) associated with the files using the normal file upload mechanism. It did however not cover the case of the upload of larger files (> 100 MB) as these use a separate upload mechanism where the files are uploaded in chunks. (These statements refer to uploading local files, for upload over URL the mechanism is different).

This was quickly identified as one of the key missing features [1], with special significance to GLAM collaborations which often combine large high-resolution media files with very rich and structured data. For this reason this was pre-identified as the first improvement to implement by the WMSE team.

During the development special emphasis was placed on minimising the risk of uploads which when interrupted result in a file being uploaded without the associated SDC. The improvement was released in August 2024 and is slated to be included in the 3.9 release of OpenRefine (it is part of the bundled wikibase extension). It has been available through snapshot releases and has already been tested by WMSE in a GLAM collaboration.

Roadmap for OpenRefine and Wikimedia Commons

[edit]

Due to the nature of the software and the communities using OpenRefine for Wikibase and Wikimedia Commons the reported bugs and requested features touching on the Wikimedia Commons integration are spread over two software repositories and multiple forums. Additionally two surveys had previously been made trying to pinpoint Wikimedia integration needs (some of which touches Wikimedia Commons).

WMSE collated the bug and feature requests from these sources, filtering them on those affecting Wikimedia Commons. This resulted in a table of prioritised tasks. Each issue was given two scores "Development time" and "Impact". The former is how long we estimated it would take for a developer to implement a solution. The latter is how large a positive impact it's expected to have for the end users once solved. Both are rough, relative values and only meant to be a way to compare the issues with each other. The scores were provided by WMSE informed by both using and teaching OpenRefine in GLAM collaborations and having worked on the code through the implementation of enabling the upload of larger files.

The table also includes a weighted score that combines the two scores. This is lower for issues that have a high impact and low development time. WMSE used this to prioritise its continued work. The table has also enabled volunteer developers to get engaged with the project (see e.g. [2] and [3]) and will be useful to the community beyond the end of this grant.

General maintenance and additional improvements

[edit]

WMSE also performed the following general maintenance and improvements:

  • Implementing library upgrades and maintaining compatibility with the latest version of OpenRefine.
  • Debugging related to the situation which arose when the Commons extension became incompatible with OpenRefine 3.8, resulting in the release of v0.1.2 of the extension.
  • Ensuring provided media captions (part of Structured Data on Commons) respect Wikimedia Commons restrictions on length [4].
  • The existing media file upload feature only allowed the upload of new files, not the upload of new versions of already existing files. This was a requested feature as it was also not provided by any other (non-command line) tools. This feature was completed in November 2024 and is identified to be included in the 3.9 release of OpenRefine (it is also part of the bundled wikibase extension). [5]
  • Additional code review and feedback on issues in OpenRefine affecting usage on Wikimedia Commons. E.g. [6] and [7].
  • A new release (0.1.3) was made of the CommonsExtension.

Updating the trainers on OpenRefine changes affecting work with Wikimedia Commons

[edit]

In collaboration with the OpenRefine team, we organized two webinars on November 27, 2024, and December 10, 2024, with a total of 11 attendees, including nine trainers. The webinar was an opportunity to

  • Present the latest changes to the Commons Extension since the release of the course on WikiLearn The demo of OpenRefine and Commons extension was recorded and we plan to release it when the OpenRefine 3.9 is released.
  • Open conversation between trainer to exchange their experience so far. This section was not recorded. Overall each trainer completed one to three trainings in 2024. The training mainly took place within their organization and with Wikimedian in residence. Trainers who showcased how to use OpenRefine to contribute to Wikimedia Commons and Wikidata at conferences found it hard to cover the process within the allotted 30 to 60 minutes. Conferences are more effective for generating interest and advertising rather than for teaching the process itself.