Commons:Structured data/Computer-aided tagging

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

The computer-aided tagging tool is a feature in development by the Structured Data on Commons team to assist community members in identifying and labeling depicts statements for Commons files. There are tens of millions of carefully curated files on Commons, but the structured data tool is new. With this feature, existing files can have their contents easily, quickly, and accurately described. To contribute, editors won’t need to know how Wikidata works or speak a particular language. This new feature prompts users with suggestions for tags, using a computer vision model, for human review. Commons users will be able to visit a Special page on Commons and see suggested depicts tags, which can be selected to be confirmed or ignored. Tags will never be automatically added without human involvement.

Computer-aided tagging is a stand-alone MediaWiki extension and is not a core part of Commons itself, and ties into Commons using Special:SuggestedTags. On the back-end, the tool will use Google Cloud Vision for depicts suggestions. Wikimedia already uses the Google Cloud Vision service in Wikisource OCR, and this will work in a similar fashion. This tool is opt-in for registered, auto-confirmed users. It is not on by default for any user group, and is unavailable to new and unregistered users.

Google Cloud Vision[edit]

All information that passes through Google Cloud Vision will also be public. Dumps will be available of completely anonymous data that lists the Commons File, its suggested tags, and which tags were accepted. Google Cloud Vision is completely isolated from Wikimedia Commons, the feature is separate from the core Commons experience.

Although there are open source computer vision platforms available to start from, any such package would require resources or specialized expertise to provide an industry-standard experience with computer vision that the Wikimedia Foundation is unable to itself provide at this time. The team recognizes that Google Cloud Vision is not open source software. There will not be any non-free or proprietary code written by the Foundation for this project, all contributions will remain open source. Google will not have access to any private, non-public, personal information, there will be no direct communication between users and Google's service.

Architecture and workflow[edit]

Design of information flow in computer-assisted image tagging. The "machine vision" provider on the far right requests and sends potential tags for images; there is no personal information exchanged and the provider is isolated from the rest of the system and Commons.

Registered, auto-confirmed users will be able to opt-in through their preferences or uploading files. After some time has passed, the user will be contacted through their notifications that their uploads are ready for tagging at Special:SuggestedTags. Users who have opted-in can visit Special:SuggestedTags at any time to view files ready for tag processing. Anonymous users, new users, and users who have not opted-in will not be able to access Special:SuggestedTags.

The concepts that are available for tagging are ones that translate from Google Knowledge Graph IDs to Wikidata IDs. At 2.1 million triplets, the list is too long to catalog here, but is available for download as freebase-wikidata mappings.

Development stage[edit]

The tool is being designed, follow along with the master tracking task. Initial mobile and desktop designs are available in the ticket, with a community consultation happening soon.

Implementation and usage notes[edit]

  • No personal information is sent to the computer vision platform provider. At launch, this new feature will only use the Google Cloud Vision system, which will be accessed via a middleware layer that hides all user data. Commons images are sent to Google servers from Wikimedia Foundation servers. There will be no direct communication between the user and external services. No personal information (IP, username, etc.) is sent to Google servers. The middleware that contacts Google servers is a Wikimedia project and is open source. No part of Google's service or code will be part of Wikimedia infrastructure.
  • Suggestions from the computer vision will not be added to an image file’s structured data until a user has verified them: This service is provided as a means to augment human activity, not replace it. All suggestions from the computer vision service are stored in a separate, specialized database. Suggestions are not saved as structured data on the Commons file until a human user confirms them.
  • Users can opt in to receive notifications alerting them that their recent uploads have suggested tags. In the last step of the UploadWizard upload process, users have an option to enable notifications that will inform them when recently uploaded files have passed the waiting period and have tags available for confirmation. This option can also be found in User Preferences under Notifications.
  • User contributions that confirm suggested depicts tags are licensed as CC0. This data is equivalent to adding Wikidata to an image, and as such must be contributed under the same CC0 license that Wikidata uses. Clear license notices will inform users that all contributions made via the computer vision tool will be licensed under CC0.
  • Analysis of images on Commons: The feature will analyze only images, and provide suggested “depicts” tags based on the content of those images.
  • Certain types of images will be excluded: Some types of imagery on Commons are not well-suited for this type of system. Small images (less than 100px wide), artworks (identified via the Artwork template), book page scans, and other files will not be included.
  • Newly uploaded files will be analyzed, but not during upload: Commons users continuously monitor new files for vandalism, copyright violations, and relevance to the project. Files that don’t meet the criteria are marked for deletion. The new computer vision feature will only analyze new files after a waiting period has passed, and will not analyze files marked for deletion.
  • All tag confirmations show up as regular structured data edits with an edit summary tag that identifies their origin from the computer vision tool: This enables all the usual curation and moderation workflows so changes can be improved, edited, or reverted. It also helps us measure the revert rate and ensure that edits made using CAT are not more frequently reverted than the average edit.