In case of Toolforge Grid failure
If the continuous
fileprotectionsync process got stuck somehow within the Toolforge Grid (wikitech:Help:Toolforge/Grid), then follow the below instructions to get things back on track.
First, log in to the server.
$ ssh tools-login.wmflabs.org $ become krinklebot
Then, inspect the logs. Were they recently modified?
krinklebot$ ls -l fileprotectionsync.* -rw-rw---- 1 tools.krinklebot tools.krinklebot 48719 Oct 14 22:52 fileprotectionsync.err -rw-rw---- 1 tools.krinklebot tools.krinklebot 82040 Oct 14 22:52 fileprotectionsync.out
If so, view the tail of the log files.
krinklebot$ tail fileprotectionsync.* ==> fileprotectionsync.err <== Updating page [[Commons:Auto-protected files/wikipedia/fr]] via API [Tue Oct 14 22:58:50 2014] there is a job named 'fileprotectionsync' already active [Tue Oct 14 22:58:52 2014] there is a job named 'fileprotectionsync' already active ==> fileprotectionsync.out <== http://en.wikipedia.org/w/api.php?action=query&prop=images&titles=Main+Page&imlimit=500&redirects&format=json http://en.wikipedia.org/w/api.php?action=query&prop=images&titles=Wikipedia%3AMain+Page%2FTomorrow&imlimit=500&redirects&format=json
If recent log entries contains only "job named .. already active", then this is a case of the grid getting stuck. We'll delete the continuous process from the grid, and create a new one.
$ qstat ------ 0.00000 fileprotec tools.krinkl ** 00/00/0000 00:00:00 ------ 0.00000 fileprotec tools.krinkl ** 00/00/0000 00:00:00
Run "jstop fileprotectionsync" as many times as there are entries for it. There should never be more than one entry given we use
jsub -once, but in practice it seems that whenever it gets stuck, there are also multiple instances (somehow).
$ jstop fileprotectionsync $ jstop fileprotectionsync
The bot should now automatically resume within 15minutes (per crontab schedule).