User:Arnomane/Orphan bot

From Wikimedia Commons, the free media repository
Jump to: navigation, search

As Wikimedia Commons admins do not want to annoy the other wikis they currently remove these file usages in all other wiki projects by hand. As well the interface does not really support efficient deletion (lots of open tabs and many edits/mouse click problem). Especially the file unlinking is a slave job and slows down dramatically Commons deletion procedure and thatfor has contributed to the huge backlog of files that need to be deleted ASAP. Thatfor Commons admins badly need software help:

The proposed solution are two tools working hand in hand: A special deletion tool for admins in a similar fashion as Commonist codenamed "The Deletionist" and an file unlink bot running in local wikis.

Unlink bot[edit]

Scope of the bot[edit]

The bot is only the second step after CommonsTicker. CommonsTicker enables local communities acting in advance of most of the deletions (with the exception of not yet tagged blatant copyvios an admin stumbles at and deletes on sight) as it notifies already when a deletion request was added to the file. CommonsTicker is the prefered method as it enables not only removal but handcrafted replace of files by another equivalent free one (no chance for bots, this is pure handwork). However Wikimedia Commons cannot wait longer than the usual Commons deletion timeframe for communities to act accordingly to the CommonsTicker messages (simply impossible given the fact that we would need to wait for 640+ wikis).

The bot will be an integral part of the anticipated dramatical deletion speedup for admins (10 to 100 times faster deletion) and will free Commons admins from the unlinking slave job. Thus in future there will be less missing files in articles (you often need to skip unlinking at a point in order just to get done at least a little bit of the forward-pressing copyvios) and admins will be able focusing on the files and thatfor increase the quality of the Wikimedia Commons repository for all projects.

Technical problems[edit]

In general Files can be embedded in the possible ways:

  • [[Image:foobar.extension|*]] and all local translations of "Image:"
  • [[Media:foobar.extension|*]] (are there local translations of "Media:" ?)
  • In galleries like:
<gallery>
Image:foobar.extension|*
</gallery>
  • In templates:
    • General embedding via a template without variable; they give you quite some false positive usages in articles.
    • There are nice templates that have a usual syntaxt like image=[[Image:foobar.extension]].
    • Annoying templates using image=foobar.extension and thus make the bracket syntax for you.
    • Evil templates using image=foobar [...] extension=extension or something even more evil.
  • "_" and " " need to be treated as the same character and first letter lower case and upper case too. As well whitespace around the filename needs to be treated identical to no character at all.
  • Embedding in write protected pages (hello Wikinews). In such cases the bot needs to write for example to the talk page and tell what needs to be done.

Furthermore:

  • The bot shouldn't replace files on talk pages of every namespace.
  • Another general problem is that the bot needs 640+ accounts or would run as anon, as the single login is still not existing.

Unlink algorithm[edit]

Commons data feed
  1. The bot looks at Wikimedia Commons Special:Log/delete feed and filters for all "Image:" delete entries.
  2. It performs a CheckUsage for the files.

These two first jobs are ready coded by User:Duesentrieb in his CommonsTicker and only need a small adaption just for unlinking feed of deleted files.

The bot walks through the single wikis using a given file
  1. It removes usages in target articles of a given wiki.
    1. All articles it could find and remove the embedding link of the given file will be marked as success.
    2. All articles it could find the link within but are write protected will be marked as found/protected.
    3. All articles it couldn't find the link within will be marked as failed.
  2. At all articles it couldn't find the file link it looks if they embed one of the other pages as template (you can embed every page as template; however for speedup try to match the templates out of template namespace first). At the first positive match it skips and marks the article as success.
  3. For all remaining found/protected and failed the bot writes a specific message to the talk page of that article, that the file link needs to be removed by hand. All such articles with write protected talk pages will be skipped silently as CommonsTicker highlights removed usages as well and in such cases any other message log would be duplication of the CommonsTicker.
  • The messages can be localised in a similar fashion as the CommonsTicker messages.

CommonsTicker currently can already write to talk pages of articles using a file such messages that ask for attention about it. So part of the code exists there as well already.

Regular expressions[edit]

Localised file links
You need localised "Image:" namespace names in order to create safe regular expressions for finding and removing file links. Every namespace is available via its english name in every local wiki too. Thatfor you could open search "Image:foobar.extension" from time to time in a wiki and look to which URL you get redirected at (HTTP: "permanently moved"). Thatfor you could extract and update the database of local "Image:" namespace names automatically.
Image relinking
Image relinking (for example in case of duplicates) just needs the file name itself but not the localised image namespace name and the brackets position. A search & replace operation like that one would be enough in all cases but evil templates:
i_want_to_be_replaced.extension ---> i_replace_you.sometimes_replace_extension
In case of evil templates or protected pages the bot should be write to the talk page of the article what the needs to replace.
Image orphaning
Orphaning needs to remove the file with brackets and localised namespace via a regular expression that also takes special note on annoying templates. Everything else see above.

See also[edit]

thoughts[edit]

by [edit]

  • forget about templates. too complicated.
  • support the most important wikis. in most of the 640+ you talk about there's not enough content to make it worth the effort
  • this is about being nice, nothing more. if you have time and energy to be nices, well, that's nice. if you don't, you still have to get your work done.
  • considering projects using an image are notified early, they could upload images to their wiki, if it's absolutely necessary. of course someone has to keep an eye on them to stop them from hoarding copyvios.
  • is there really a need for relinking?

any comments? -- 22:52, 17 September 2006 (UTC)

by Arnomane[edit]

My deletion style is as follows:

  • I pick up a copyvio image in one of our usual deletion categories/requests.
  • I click on its uploader and look after his entire upload gallery (external tool by Duesentrieb), wich is linked in the interface (one copyvio promises three not yet recognized ones).
  • I scan the entire gallery and delete all copyvios of that uploader on sight.
  • I remove about 50% of all external links embedding all these images (depends from my mood and time, as it is a very time consuming job).

So we'd need a delete tool that directly highlights users with a high ratio deleted/uploaded. However it also needs to take the absolute number of uploads into account as one deleted image out of two is far less problematic than 50 out of 100. Arnomane 23:06, 17 September 2006 (UTC)

by Cool Cat[edit]

There already are unlinking bots operating such as the en.wiki orphan bot. It shouldnt be too hard to impliment such a bot to operate for all wikis. However interwiki bots like this are frowned upon IIRC.

I think dealing with the problem rathet than the symptom would be the prudent course of action. Newly uploaded images should be listed (either with an external tool or not) and reviewed. Generaly speaking, google image search is an excelent way to S&D copyvios.

Currently there is a state of anarchy in commons, we need martial law. :P

--Cat out 16:46, 19 September 2006 (UTC)