Commons:Bots/Work requests

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Shortcut: COM:BR · COM:BWR

Bot policy and list · Requests to operate a bot · Requests for work to be done by a bot · Changes to allow localization  · Requests for batch uploads
Gnome-system-run.svg


Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day.



Category:Pages using Information template with parsing errors Clean up[edit]

Hi, I populated Category:Pages using Information template with parsing errors again. Many of the issues are problems with the whole batches of images and could be fixed with automatic and semi-automatic processing with AWB or Python. Please help clean up this category. --Jarekt (talk) 05:07, 11 December 2015 (UTC)

Thanks for creating this maintenance category.
I noticed that some of the errors were caused by Aschroet's bot. He may be willing to fix those. --Leyo 08:04, 11 December 2015 (UTC)
I'll do so. --Arnd (talk) 08:10, 11 December 2015 (UTC)

What caused File:Obama at New Economic School-1.jpg, File:S (set u).png and File:Sidamonidze COA.png to be included to this category? Is there anything that needs being fixed there? --Leyo 08:23, 11 December 2015 (UTC)

Jarekt, just to derive an effort estimation, is this the amount of broken templates generated over the last 9 months? As i remember the last run was in March. --Arnd (talk) 12:44, 11 December 2015 (UTC)

Arnd I thought those are templates generated over the last 9 months, but looking at some of them more closely, it seems like some of those are templates that were broken for years. Other cases include images that did not have information template but did have parsing errors, which were than processed by user:Dexbot and the old parsing errors break the new information template. Leyo sorry for those false alarms, database queries sometimes return those (in this case none of the files were supposed to transclude {{Infobox_template_tag}} or {{Information}}). I will be adding more files to Category:Media missing infobox template and subtemplates, since I just got quarry:query/2556 to work.--Jarekt (talk) 15:00, 11 December 2015 (UTC)

Users with most affected files seem to be Sailko and Rrius. --Leyo 14:41, 11 December 2015 (UTC)

I also asked User:Slowking4 for help with some of his uploads which where really confusing. --Jarekt (talk) 15:09, 11 December 2015 (UTC)
Sailkos problems were caused by my bot. --Arnd (talk) 16:28, 11 December 2015 (UTC)
Most of them may be found using incategory:Pages_using_Information_template_with_parsing_errors -insource:/description/i. --Leyo 16:32, 11 December 2015 (UTC)

Jarekt, it would be really great when this category is being populated on a daily basis. This way my bot which is already watching other maintenance categories could quickly inform the responsible user. Is that possible or too expensive in terms of runtime or implementation?--Arnd (talk) 19:49, 11 December 2015 (UTC)

Arnd, I think it is a good idea. User:Zhuyifei1999 this sounds very much like Commons:Bots/Requests/YiFeiBot (13) except for the query and category. May be you want to look at it. Currently the files in this category were added by running quarry:query/2556 and then adding Category:Pages using Information template with parsing errors if string "{{Information" or \|\s*source\s*= were found in the metadata or Category:Media missing infobox template otherwise. --Jarekt (talk) 04:42, 13 December 2015 (UTC)
Ok, I'll work on it tomorrow probably --Zhuyifei1999 (talk) 07:07, 13 December 2015 (UTC)
Commons:Bots/Requests/YiFeiBot (25) --Zhuyifei1999 (talk) 11:02, 15 December 2015 (UTC)
insource:/\{\{Information/ -insource:/\<nowiki\>/ -hastemplate:Information -hastemplate:Information_field (or similar searches) yields some real-time results. --Leyo 22:53, 11 December 2015 (UTC)
There is only one file in this category. Any problem with the bots? ;) -- MaxxL - talk 11:33, 13 December 2015 (UTC)
The category has been cleaned up, but there are still plenty of files found with my search link above. --Leyo 14:26, 13 December 2015 (UTC)
Well - after repairing about 1k files by hand I thought the job was done. -- MaxxL - talk 14:40, 13 December 2015 (UTC)
The remaining ones have other problems, see sample fixes. --Leyo 15:41, 13 December 2015 (UTC)
This category only included parsing errors that caused the infobox template not to render at all, while there are no other infobox templates in the files. Leyo's excellent search finds files that have templates like {{Artwork}}, but also remnants of broken {{Information}} template. --Jarekt (talk) 02:04, 14 December 2015 (UTC)
Is anybody able to clean up the following common pattern (removing the first line)?
{{Information
{{Artwork
I struggle because of the line break. --Leyo 21:56, 14 December 2015 (UTC)

Thank you MaxxL (and maybe some others) for cleaning up that category so quickly. I hope that in future there will be ways to reduce that amount of cleaning work. Maybe we could somehow guestimate this effort to have arguments to push Commons:Structured data. Have a nice evening, --Arnd (talk) 19:03, 13 December 2015 (UTC)

Thanks everybody for quickly fixing all those files. --Jarekt (talk) 02:04, 14 December 2015 (UTC)

Some additional errors are found using insource:/\{\{Information/ incategory:Media_missing_infobox_template. --Leyo 14:16, 16 December 2015 (UTC)

All but 23 (mostly false-positives) done, but the search link further above still yields 149 hits. --Leyo 23:13, 3 January 2016 (UTC)

Strangely, insource:/\{\{Information/ -insource:/\<nowiki\>/ -hastemplate:Information -hastemplate:Information_field now even yields 264 hits. --Leyo 01:08, 24 January 2016 (UTC)

Removing inadvertent signatures[edit]

Every here and there, I stumble upon such inadvertent signatures on file description pages. Any means in finding and removing them in an automated way? --Leyo 17:43, 3 January 2016 (UTC)

Not sure. The way SignBot detects signatures is by checking a link to userpage, but those likes are also often found in author= fields. "(UTC)" is also found malformed date= fields. Maybe using a dump scan on the general format (without any customization, [[User:Username|Username]] ([[User talk:Username|<span class="signature-talk">talk</span>]]) HH:MM, (D)D Month YYYY (UTC)) is good enough? --Zhuyifei1999 (talk) 09:24, 4 January 2016 (UTC)
Author fields may cause many false-positives. I tried insource:/\}\} *\-* *\[\[User\:.+\(UTC\)/ and insource:/\-* *\[\[User\:.+\(UTC\)/. The latter gives quite many hits, too. --Leyo 11:10, 4 January 2016 (UTC)
Is it possible to ignore author fields? Poké95 02:45, 17 January 2016 (UTC)
AFAIK, a parser would be required to safely determine if a signatures is in an author field or not, as many signatures contains pipes (|) themselves, and may screw up regexes. --Zhuyifei1999 (talk) 06:02, 17 January 2016 (UTC)
There are some users who added inadvertent signatures several times. I cleaned them up for a few of these users. --Leyo 13:05, 24 January 2016 (UTC)
The hard thing is, you can't find "inadvertent" signatures, you can only find all signatures. You can use a parser to make sure it's not inside of a template, but a lot of the pages I found on the list I've compiled for this are like this, or this. You'd have to do this all manually assisted, because the bot can't tell the difference between an "inadvertent signature" and someone using a file page as a talk page (which a lot of people are doing.) Riley Huntley (talk) 23:53, 28 January 2016 (UTC)

Creator templates[edit]

Hi, We can now get Wikidata information into Creator templates. So there are 2 things a bot could help:

  1. Update existing Creator templates;
  2. Create them when missing (see Special:WantedPages). This probably needs some planning.
  3. Update description pages accordingly (e.g. in File:Horse Middleton cantering, saddled with rider (rbm-QP301M8-1887-619).jpg, replacing "Muybridge, Eadweard, 1830-1904" by Creator:Eadweard Muybridge).

Regards, Yann (talk) 14:50, 26 January 2016 (UTC)

A Creator template does not make sense in many of the “wanted” Creator templates. --Leyo 15:11, 26 January 2016 (UTC)
Yes, the list needs checking first. That's why I say above "planning needed". Yann (talk) 15:18, 26 January 2016 (UTC)
How to get rid of those where a Creator template does not make sense? --Leyo 15:20, 26 January 2016 (UTC)
Not a definitive answer, but I can see several possibilities: check if an entry exists in Wikidata/Wikipedia -> creation needed, if the author matches the uploader (creation not needed). Regards, Yann (talk) 15:28, 26 January 2016 (UTC)
Well, I was referring to entries such as Creator:Infrogmation of New Orleans, Creator:Michiel1972 or Creator:Photo: Andreas Praefcke. --Leyo 15:36, 26 January 2016 (UTC)
It seems that Special:WantedPages lists as missing Creator missing all cases where the author doesn't match the uploader. That's another issue than creating missing Creator pages. Regards, Yann (talk) 16:34, 26 January 2016 (UTC)
About updating existing Creator templates. I would rather wait for phabricator:T49930 "arbitrary Access" feature and rewrite Creator templates so they pull the needed information straight from the wikidata, if information is not provided. --Jarekt (talk) 16:55, 26 January 2016 (UTC)
I have a bot that can do this widely based on Category:Creator templates to be created by a bot and matching birth and death dates, but we decided not to run it for this reason (better to wait for arbitrary access, supposedly coming within a few months). See Commons:Bots/Requests/BMacZeroBot 3. My bot can also replace "implicit" creators in {{Information}} templates, and has done a lot of them. It doesn't catch many cases, but I'm planning on working on it further. BMacZero (talk) 17:19, 27 January 2016 (UTC)

Throttle limits[edit]

I keep bumping into housekeeping jobs for Faebot where the default throttle limits make no sense. My run of the mill reports and uploads are fine, but having a housekeeping task that takes weeks and is likely to drop out several times and need rebooting in that period is a bit silly, when removing the throttle means it could run in an hour.

Could someone who runs faster jobs advise whether it's worth me creating a separate special fast bot account just for these odd jobs (normally GLAM related) and applying for an unthrottled new account, or whether I should just request that Faebot is unthrottled and I'll tack on -putthrottle to my jobs to manage their good behaviour more directly? Thanks -- (talk) 14:08, 4 February 2016 (UTC)

What is exactly the goal of throttling? If it is to keep users watchlists from flooding, I'd want to argue that one time 1000 edits in a few minutes (which can be skipped) is better than weeks getting 10 edits an hour on my watchlist. For recent changes the same argument holds. If the edits are done under a botflag (given it's a bot), it's even less of an issue. If the goal of throttling is to be able to fix mistakes: maybe a slow start with some careful checking on the edits and then going full speed is then the best idea. Due to sheer number it is not more likely that the edits get checked more thoroughly if they are done on a lower speed. On Commons:Bots the main thing which is said is that if it non-urgent it should be on max 12 edits/minute. Given the number of edits of Fae and his bots I would argue that such a speed is limiting the number of improvements that can be performed. Related is that when using visualfilechange or hotcat (on a non bot flagged account) edits are usually performed much faster, I've seen (and done) 100s of edits with those in just 1-5 minutes. All these edits are generally unflagged (which is not per se a bad thing, it is good if edits are checkable), having a limit for flagged bot accounts of 12 edits/minute makes no sense when these gadgets get speeds above 100/minute. Thus I'm in favor that those who know what they are doing with there bots, and are ready to clean up (or mass revert) if something goes wrong, increase their bot speeds quite a bit. Basvb (talk) 16:43, 4 February 2016 (UTC)
Thanks for the perspective, encouraging. Any advice on where I ought best to request a change to throttle limits? I'm unsure if this would mean a full request at Bots/Requests, or I can drop a note somewhere else. -- (talk) 17:02, 4 February 2016 (UTC)
Hmm, the main argument against seems to be server load (see the meta policy, which was the reason the commons policy was formed with this limit). However seeing the visualfilechange and hotcat edits and the fact we are quite a few years after this rate has been discussed on meta maybe it is something worth looking into (are the same limits still important or can they be upped a bit). Given this I would want to lower my encouragement a bit. But I'm wondering how the bot limits relate for example to the semi-automatic gadget edits (hotcat, visualfilechange). Basvb (talk) 17:09, 4 February 2016 (UTC)
I asked a question at meta regarding technical issues with high speed limits. As it seems that we frequently hit high numbers of edits/minute on Commons (at least I do) I'm wondering whether these bot limits should really be this strict. Using VisualFileChange It seems that I go up to 250 edits/minute (see here), if that influences the server I should stop with that and would like to know that it does (it might be different when one is doing 100.000 edits, compared to those with a max of a few thousand, but it is still a lot). Basvb (talk) 17:24, 4 February 2016 (UTC)
Well, my next step is probably to raise a request on Bots/Requests then rather than go informal. Just changing to an account limit of 1 edit/second or 100 per minute would be a great improvement, keeping in mind that my reports, tests and run of the mill regular things that work today I would happily throttle to the mediawiki suggested default of 1 every 10 secs and I'm not imagining creating parallel processing threads for any one job, so I'm never going to run like the clappers in the way that VFC does. -- (talk) 19:15, 4 February 2016 (UTC)
Now added to Commons:Bots/Requests, refinements to the rationale, or suggestions for good practice welcome there in discussion. -- (talk) 19:56, 4 February 2016 (UTC)
I also believe throttle limits of around 1 edit/second is perfectly acceptable. No limit is very handy when working with VFC, but VFC is limited on how many images you could easily select, so those jobs never last long. I just looked at some edits I did with cat-a-lot today and I counted 43 files which were edited at 9:01, 43 edits/sec is equivalent to 2580 edits/min. That speed might be OK for a second or two, but is probably not sustainable for longer periods. --Jarekt (talk) 03:41, 5 February 2016 (UTC)
@Jarekt: Are you sure those are all in one second? I believe that the edit times always show up in minutes, there's no indication of seconds and in my experience when I did some edits (with visualfilechanger or cat-a-lot) somewhere around 250 edits/minute seems to be the limit of those. With visualfilechange it is possible to go over quite a lot of files (ca 25k I've done in the past), loading the images in is then the main issue, meaning that it is unlikely somebody will start more than one or two of those bigger VFC runs in an hour. Basvb (talk) 14:42, 5 February 2016 (UTC)
Basvb, You are right. I do not know how I mistook "9:01" for time with accuracy up to a second. So cat-a-lot's speed was 43 edits/min, which seems much more reasonable. Those are the speeds any editor can edit at without any bot flags, so a bot should be able to go as fast too. --Jarekt (talk) 19:44, 5 February 2016 (UTC)
I agree as well, I'm running around 80 edits per minute with my bot right now; it has 10,000 files in queue; I can't imagine trying to complete this task at a rate of six edits per minute like Commons:Bots recommends. Riley Huntley (talk) 04:54, 5 February 2016 (UTC)
I believe the current rate max listed is 12 per minute or one every 5 seconds. Basvb (talk) 14:42, 5 February 2016 (UTC)

I raised the request but removed it again after it was explained that Faebot was not throttled, this was something that was fixable in Pywikibot if I wanted to edit at the 1-per-second rate. I had assumed the behaviour I saw was related to account throttling, but it's down to the Pywikibot defaults, though my confusion came from the -putthrottle:nn parameter not being sufficient. It's probably a topic that the guidance in the mediawiki manual should spell out more clearly. -- (talk) 09:02, 5 February 2016 (UTC)

The other thing to consider in addition to server usage, is the effect that the flooding has on IRC for CVN channels. I ran my bot on full speed, unthrottled and it delayed my channel by at least two hours, not sure how much #cvn-commons was delayed. Riley Huntley (talk) 03:04, 8 February 2016 (UTC)
Sorry I do not use IRC and never heard CVN mentioned before. Why would it be delayed? --Jarekt (talk) 03:20, 8 February 2016 (UTC)
irc.wikimedia.org (read only) is basically a recent changes feed, just on IRC. IRC bots that track recent changes, can "snatch" from there and "snitch" on to freenode into channels that CVN (counter vandalism network) users review. The bots obviously only snitch when needed, like an abuse filter, but it can severely slow down the process if there a bot editing at high speeds (i'd say at least 80 epm+) That is the best explanation I can give. :) Riley Huntley (talk) 05:16, 8 February 2016 (UTC)
@Riley Huntley: I think the solution for that is to snatch edits from this recent changes without edits made by a bot, so they won't have to snitch edits from bots. I think IRC bots are configured to snatch all changes (which includes bots), so by snatching using the modified recent changes above, the IRC channels won't delay. If I'm mistaken, tell me. I am not good in programming though.... Poké95 02:53, 13 February 2016 (UTC)
I don't believe that'd work. The IRC bots are only snatching what matches their interests, I believe the delay is just from changes pouring in to irc.wikimedia.org. Riley Huntley (talk) 03:00, 13 February 2016 (UTC)

Touching[edit]

At Commons:Administrators' noticeboard/Archive 56#Category:Non-empty disambiguation categories I mentioned (caching?) problems of disambig categories. As I am not familiar with pywikibot, could someone please perform a null-edit touch run on (now 639) category pages listed here: quarry:query/7250, thank you. --Achim (talk) 15:38, 12 February 2016 (UTC)