User:CommonsDelinker/Suggestions

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Bugs:

  • All threading is done without locking. This is very dangerous, as it is very much possible that threads write to the same variable which causes almost certainly unexpected conditions, such as spontaneous crashing or data corruption.
    I havent seen any errors because of this, nor have they been reported.
  • Too many threads are spawned. Now the bot may spawn more than 100 threads in a matter of seconds. Eventually the kernel will refuse to spawn more threads, and an exception will be thrown.
    Yep. I've had could not create thread errros.
  • Connections to MySQL are created for each image, in a non persistent manner. This will cause that very soon the connection limit will be reached.
    Yep. That sucks.
  • The query api is queried is a totally wrong way. This will cause getlog() to return wrong results for images with &, < and > in the titles.
    If you say so. I don't really know.
  • Connections to Wikimedia are done in a non persistent manner. This is very bad for the performance and also for the servers.
    I though that much. Can you improve it?
  • Checkusage is called in a very bad manner. See the above three items.
    As we have talked about earlier, we should most probably find a way to start using your python version of CheckUsage with one persistent connection to host sql and one to en.wp.

Fixes:

  • Implement a thread pool which is thread safe, and allows the use of one MySQL connection per thread, and is less expensive for the server and better for performance.
  • Query api should use simplejson, which will guarantee correct handling of special characters.
  • Checkusage can be directly done against the toolserver database instead of relying on Duesentrieb's. This is better for performance and more reliable. <http://tools.wikimedia.de/~bryan/checkusage.py>
  • Connections to Wikimedia should be persistent. Building a connection is a lot of overhead. At least the query api should use persistent HTTP connections. Unfortunately, pywikipedia does not support this the last time I checked. (Side note: This is the reason why I created my own bot framework. It works for editting and basic image functions, but unfortunately not for other purposes.)
    I could not agree more.... Siebrand 00:09, 26 May 2007 (UTC)

Log[edit]

  • I did UPDATE u_orgullo_logs SET newimg = NULL WHERE newimg = "NULL"; on the database. Due some broken code, the bot inserts the string "NULL" into the database, instead of the NULL value. -- Bryan (talk to me) 11:34, 26 May 2007 (UTC)