Commons:Bots/Requests/OgreBot 2

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

OgreBot (talk · contribs)

Operator: Magog the Ogre (talk)

Bot's tasks for which permission is being sought: Formally requesting additional functionality not associated with original functionality. The bot will comb the list of all the uploads for a day, and then place them in a gallery based on categorization. Currently, there are requests for a gallery with every image in Category:Aviation and Category:Watercraft. There may be similar or related requests in the future (e.g., a gallery of uncategorized images).

I have already begun the project, but I've limited it to my userspace as a trial (User:OgreBot/Aviation, User:OgreBot/Watercraft). I'm requesting permission here so that I can do it in outside my userspace, and add in some additional functionality: namely a blacklist page to ferret out some subcategories that don't apply (which is causing all sorts of weird images like File:Palais de justice de Saint Calais.jpg in the watercraft gallery).

I would also like to reserve the right to put this under another bot name if desired at a future time.

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): daily

Maximum edit rate (eg edits per minute): Each page updated once per day (currently, that's a total of 2 per day)

Bot flag requested: (Y/N): N/A (already has flag)

Programming language(s): PHP, Peachy framework

Magog the Ogre (talk) 05:38, 14 December 2011 (UTC)[reply]

Discussion

Re Rillke 1: that should be fairly simple. Should we worry about vandalism, though? If someone replaced the page with 2MB list of categories, it would probably crash the bot.
Re Rillke 2: I hadn't realized that can even be done. But at this point, the expensive part (in terms of server time and queries) is gathering the list of categories, not downloading the list of new files. It takes 4-5 minutes of server time to download and process the categories each day; I am considering keeping a local cache, but then it wouldn't catch changes in categorization until the cache is updated. (As an aside, can I point out that PHP's hashing algorithm with associative arrays sucks; I kid you not, I had to write my own hash table to make the program viable.)
Re: EugeneZelenko: I don't quite understand your first question, but I'll try to interpret based on your second question. This bot fetches a list of all images in a category and its subcategories. The list of subcategories can be huge (thousands), so it's impractical to search through them for new files. Magog the Ogre (talk) 17:54, 14 December 2011 (UTC)[reply]
Thanks for your reply and sorry for the long delay (users tend to eat-up each other instead of working). Your can your bot have some limits? Anyway, I could create an abuse-filter preventing users with less edits, non-autopatrolled or non-autoconfirmed adding such a template to a page. I hope we'll find the time to talk about the specific implementation to start it smoothy.
Was just a suggestion. Maybe toolserver has more powerful ways (like directly querying the category table: SELECT * FROM categorylinks ORDER BY cl_timestamp WHERE ((cl_timestamp BETWEEN startDate AND endDate) AND (cl_to="category_name"))). But I am talking about something I never tried myself (don't have a tools-account). RE rillke questions? 13:56, 2 January 2012 (UTC)[reply]
With CatScan2 you can do some selections yourself. --  Docu  at 19:04, 4 January 2012 (UTC)[reply]
Personally I use User:OgreBot/Watercraft and find it useful. If the crawling takes only 4-5 minutes per day, the load should be acceptable.
BTW, as User:OgreBot/Watercraft has grown quite large by now, maybe the output should be archived after a week or so. --  Docu  at 19:04, 4 January 2012 (UTC)[reply]

If there are no objections, I think task should be approved. --EugeneZelenko (talk) 15:25, 10 January 2012 (UTC)[reply]