Commons:Bots/Requests/YiFeiBot (19)

From Wikimedia Commons, the free media repository
Jump to: navigation, search

YiFeiBot (talk · contribs) (19)

Operator: Zhuyifei1999 (talk · contributions · Number of edits · recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Remove space(s) before file extensions. (affecting 48970 files as of today) Requested at Commons:Bots/Work_requests#Spaces_before_file_extensions

Automatic or manually assisted: Automatic unsupervised

Edit type (e.g. Continuous, daily, one time run): one time run (the rest I hope will be manually fixed. User:Steinsplitter made a list in https://tools.wmflabs.org/steinsplitter/leerz.php)

Maximum edit rate (e.g. edits per minute): 6 3 moves per minute

Bot flag requested: (Y/N): N

Programming language(s): python: pywikipedia

Zhuyifei1999 (talk) 10:48, 25 April 2014 (UTC)

Discussion

  • Test run ✓ Done at [1] with 2 files failed move. The File:File: issue was fixed just after the test run. (Do I need another one?) --Zhuyifei1999 (talk) 10:59, 25 April 2014 (UTC)
    Looks OK for me. How about fixing double dots before extension too? May be same for quotes?
    Will bot fix files usage? May be redirect should not be left for unused files?
    EugeneZelenko (talk) 14:22, 26 April 2014 (UTC)
    1. Double dots will be a different run (I'm not sure if I can make both tasks in a single script)
    2. Files usage: the bot will dump delinker commands to User:YiFeiBot/sandbox/1 every 1000 attempts (~3 hours, to avoid the page getting too large), as in the test run. Admin assistance is required to check and move the commands to User:CommonsDelinker/commands
    3. Redirect: deleting/suppressing redirects is impossible without admin assistance. ;) And it may break things if delinker doesn't process the images before the deletion of redirects. --Zhuyifei1999 (talk) 14:34, 26 April 2014 (UTC)
    4. I suggest not deleting redirects. --Steinsplitter (talk) 14:37, 26 April 2014 (UTC)
  • Time2wait.svg On hold per IRC discussion: 1. delinker can do max 500 images at a time. 2. WMF storge issue may cause image loss. --Zhuyifei1999 (talk) 09:46, 27 April 2014 (UTC)
    May be you should set bigger delay between renames? --EugeneZelenko (talk) 14:30, 28 April 2014 (UTC)
    That can fix #1 (maybe 1 rename per minute? ~= 1 month) #2 File missing after a move is still commonly seen --Zhuyifei1999 (talk) 10:22, 29 April 2014 (UTC)*
    Yes, one per minute is perfect per delinker. --Steinsplitter (talk) 10:38, 29 April 2014 (UTC)
    @Steinsplitter: dump global replace commands? --Zhuyifei1999 (talk) 12:31, 29 April 2014 (UTC)
    Or maybe Eugene can flag your bot (+sysop) for some days to directly edit the delinker page? --Steinsplitter (talk) 12:33, 29 April 2014 (UTC)
    Need command checking? (what if the space is necessary?) Anyway imo admin bot is evil --Zhuyifei1999 (talk) 13:00, 29 April 2014 (UTC)
    I think User:CommonsDelinkerHelper could be used and User:Zhuyifei1999 could be added to account owners or at least find out a way to communicating with bot. --EugeneZelenko (talk) 14:06, 29 April 2014 (UTC)
    This is simply not possible. The bot schould be flagged for one day (during the filemoves) as +sysop to edit the delinker page. --Steinsplitter (talk) 08:30, 30 April 2014 (UTC)
    We have dedicated bot for this purpose. I suggested to share ownership with Zhuyifei1999 or find way to communicate between two bots. I don't think that either methods are impossible to implement. --EugeneZelenko (talk) 14:16, 2 May 2014 (UTC)
    No, There is atm no possibility. I don't grant other bots access to delinker on toolslabs for various reasons. --Steinsplitter (talk) 14:27, 2 May 2014 (UTC)
    ┌─────────────────────────────┘
    Or alternative way: Admins check everything & bot will be running slow (1 move pre minute ~= 1 month to run), so there won't be a huge backlog --Zhuyifei1999 (talk) 14:34, 2 May 2014 (UTC)
    Operations hay said to me on IRC 2 moves per minutes are okay. And two requests to delinker per minute are fine too. But i am not sure if moving +48970 tables around breaks the serverbackend. --Steinsplitter (talk) 14:37, 2 May 2014 (UTC)
    Was there a mass-rename bot previously (> 10000 moves)? --Zhuyifei1999 (talk) 08:27, 30 April 2014 (UTC)
    No, Atm we are talking about 48970. --Steinsplitter (talk) 14:30, 2 May 2014 (UTC)
  • Just another idea to clean up: leading dot(s) and other punctuation marks (comma, semicolon, colon, etc) in beginning of file names. --EugeneZelenko (talk) 14:31, 29 April 2014 (UTC)
    Double dots before extension affects 75186 files; space and then dot affects 47412 files; dot in the beginning: 934; comma: 50; semicolon: 5; colon: 0 (impossible in wiki syntax). So I will do double dots before extension in a separate script. The others can be done manually or with javascript. --Zhuyifei1999 (talk) 08:27, 30 April 2014 (UTC)
    Oppose, such a massmove was never done. We need to test first (with the " ") how it works. --Steinsplitter (talk) 08:31, 30 April 2014 (UTC)
    Per WMF opeartion three moves per min are okay. I think this task can approved now. Bot needs no +sysop, i will process the replacements with delinker. --Steinsplitter (talk) 11:23, 13 May 2014 (UTC)
Pictogram voting question.svg Question What happens if the target file name already exists?--McZusatz (talk) 17:12, 13 May 2014 (UTC)
That's a skip. (That's the case in one of the two files that failed moves. I forgot the other, however.) --Zhuyifei1999 (talk) 12:07, 14 May 2014 (UTC)
Pictogram voting question.svg Question. As we had some files disappearing after moves with no apparent reason and were forced to delete them entirely, how does your bot ensure consistency? Is there any mechanism ensuring that the raw-file is not an 404 and the description is not empty after the move? --McZusatz (talk) 17:12, 13 May 2014 (UTC)
There's no way to be sure that 404 doesn't appear, but according to Steinsplitter's post above ^^, WMF Operations three moves per min are okay (I'd rather do 1 per min, however)
  • Pictogram voting info.svg Bureaucrat note: I'm inclined to approve this request now we're sure there will not be any problems server-side, but let's give people at least 24 more hours to voice their concerns (if any). On that subject, I'm okay with how the task is planned. odder (talk) 12:34, 13 May 2014 (UTC)
    I still think that will be good idea if proposed bot will work with User:CommonsDelinkerHelper. Either through API or providing list of task via email, wiki page, etc. --EugeneZelenko (talk) 14:21, 14 May 2014 (UTC)
    @EugeneZelenko: Reading Steinsplitter's message above, it is my understanding that's exactly what's going to happen here. Perhaps @Steinsplitter and @Zhuyifei1999 might provide us with more detail about how the process is going to look like, so as to make sure we are all on the same page. odder (talk) 17:16, 14 May 2014 (UTC)
    @Steinsplitter, Odder, EugeneZelenko: My understanding of above and related IRC discussions:
    • Every 20 seconds: move a file (any error will prevent the move)
    • Every 100 either successful or failed moves (~30 mins): Dump replace requests to User:CommonsDelinker/commands/filemovers (failed moves will not appear, per last test run)
    • Every ? minutes: Steinsplitter or some other admin process will process the reqs
    • No checking of 404s as I don't want to flood the storage with double requests
    Am I right with this? --Zhuyifei1999 (talk) 10:23, 15 May 2014 (UTC)
The problem is that you can not ensure that someone will process the requests every 2.5 hours. Even if someone actually did, there is still a chance that more than 500 files will hit CommonsDelinker at once. (Other file movers will contribute as well). Furthermore I feel more comfortable if CommonsDelinker does the replacements in real time/constant speed. --McZusatz (talk) 11:38, 15 May 2014 (UTC)
I don't see a problem... (@MCZ: Ich vretstehe das problem nicht... Dan verschieben wier 250 und 250. Du kommentierst erst nachem ein Bürokrat den Antrag freigeben will... Warum erst jetzt?) --Steinsplitter (talk) 11:50, 15 May 2014 (UTC)
If you are willing to keep track of it manually, I won't have a problem with it. --McZusatz (talk) 12:30, 15 May 2014 (UTC)
(@Steinsplitter: Da die Aufgabe des Bots nicht wirklich feststand (zumindest für mich) fand ich eine Nachfrage durchaus angebracht. Umso dringlicher, wenn der Bot demnächst loslegen soll. --McZusatz (talk) 19:54, 15 May 2014 (UTC))
Is CommonsDelinker queue size accessible via API or as wiki page? If it accessible, YiFeiBot should just wait if queue is full. --EugeneZelenko (talk) 14:19, 15 May 2014 (UTC)
Do I get this right: If User:CommonsDelinker/commands/filemovers is too large (50000 bytes?), pause the bot? --Zhuyifei1999 (talk) 14:37, 15 May 2014 (UTC)
I think number of entries on page is better criteria then size in bytes. --EugeneZelenko (talk) 14:07, 16 May 2014 (UTC)
Eugene, It is easer to work with bytes for the bot operator. I don't see the need to count templates. --Steinsplitter (talk) 14:17, 16 May 2014 (UTC)
Sure, pages size is easier to check. But page size limit should rely on minimal rename record size and have margin. --EugeneZelenko (talk) 14:28, 17 May 2014 (UTC)

┌───────────────────────────────────────────┘
Let's make use of average. According to query select avg(length(img_name)), min(length(img_name)), minmax(length(img_name)) from image where img_name regexp "_\.[a-zA-Z]+$";, the average of the length of the filenames to me renamed by this bot is ~ 61.2997 (max 246, min 7). With python code 61.2997 * 2 - 1 + len("{{universal replace|||reason=Robot: Removing space(s) before file extension}}"), the average command length ~ 198.5994. With 400 requests, page size ~ 30921.5994 (without the header). --Zhuyifei1999 (talk) 15:10, 17 May 2014 (UTC)

Why not just count all "{{" on the raw page if size>10k? --McZusatz (talk) 11:38, 18 May 2014 (UTC)
I'll do this on Friday --Zhuyifei1999 (talk) 08:29, 19 May 2014 (UTC)
No time, tomorrow. --Zhuyifei1999 (talk) 08:56, 24 May 2014 (UTC)
✓ Done Does User:YiFeiBot/~/pywikibot/com_end_space.py look okay? --Zhuyifei1999 (talk) 09:31, 25 May 2014 (UTC)
I don't understand py perfectly, but it seems you implemented everything that was discussed above. --McZusatz (talk) 14:05, 29 May 2014 (UTC)
Pictogram voting question.svg Question Is there anything else that needs doing (implementing) or discussing, or can we close this request as successful? odder (talk) 08:52, 30 May 2014 (UTC)
No, everything should be clear now and there was enough time to raise concerns. --McZusatz (talk) 09:17, 30 May 2014 (UTC)
+1 :) --Steinsplitter (talk) 09:19, 30 May 2014 (UTC)
Pictogram voting info.svg Bureaucrat note: Closing as ✓ approved, then. odder (talk) 10:29, 31 May 2014 (UTC)