Commons:Bots/Requests/DrTrigonBot

From Wikimedia Commons, the free media repository
Jump to: navigation, search

DrTrigonBot (talk · contribs)

Operator: DrTrigon (talk)

Bot's tasks for which permission is being sought:

  1. Categorization (by Computer Vision) according to User:Multichill/Using OpenCV to categorize files
  2. "SubsterBot" according to w:en:Wikipedia:Bots/Requests for approval/DrTrigonBot

(part 1 is new and part 2 runs already on a few other wikis, please see SUL)

Automatic or manually assisted: automatic (Categorization is in development and libraries are missing on toolserver, thus at the moment part 1 "manually assisted" but this will change finally)

Edit type (e.g. Continuous, daily, one time run): daily (on toolserver)

Maximum edit rate (eg edits per minute): pywikipediabot default

Bot flag requested: (Y/N): Y

Programming language(s): python (pywikipediabot framework), C++

Source code available: https://fisheye.toolserver.org/changelog/drtrigon

DrTrigon (talk) 20:43, 20 April 2012 (UTC)

Discussion

Some more description for the 2 different bot parts:

  1. Is in development and the main reason for this request; I would like to start (slowly! a few pages e.g. 5-20) editing pages in order to test the bot further. Some preliminary results can be see at User:DrTrigon/Category:Unidentified people (bot tagged). At the moment just "people" are categorized by face and eye recognition. The "reliable" mode would be just to categorize all images marked green there, the "hint" mode to categorize all ("is may be a person"). I would like to first remove the Template:Uncategorized and then add the Template:Check categories as well as the two Category:Unidentified people and Category:Categorized by bot. Furthermore add the detection results like e.g. "face at position ..." and "eye at ..." to the Template:Information by using Template:Information field. If this makes sense / is ok?!?
  2. Runs already on some other wikis and could be useful here once, e.g. if there is a request by someone... (is not important for me at the moment - but should cause no harm too ;)

Thanks a lot for all feedback and greetings --DrTrigon (talk) 20:43, 20 April 2012 (UTC)

Please make a test run with adding actual categories to images. --EugeneZelenko (talk) 14:42, 21 April 2012 (UTC)
I thought Multichill is working on this. Unfortunately we have to use such categories like Category:Unidentified people (bot tagged) because our search is not capable recognizing the image's contents.
I wonder whether it would be also useful to add color-categories; once the file is downloaded, we should attempt to extract as many information as possible.
I have to admit that openCV sounds promising, but didn't read enough about, yet. It would be also good if it could distinguish photos from diagrams, ....
On the result-page it would be handy if the faces would be framed somehow, e.g. with ImageAnnotator (but this is not really important for categorizing or putting a <div style="position:relative;">[[File:...]]<div class="face-marker" style="position:absulte; left:$1*ratio; top:$2*ratio; width: ...; height:; border:2px solid yellow;"></div></div> container-around the image :-) -- RE rillke questions? 15:03, 21 April 2012 (UTC)
@EugeneZelenko: I am doing the first test runs that actually change image pages and categorize them - please stay tuned... ;)
@Rillke: Yes in fact Multichill suggested this one one the maillist - since I was working with OpenCV a little bit too, found that intressting and think python is a very good choice since it enables to easily embbed the pure C/C++ examples and code also - I started to work on this a little bit. More categorization based on BoW (Bag of Words) algorithm is planned for the future, but at the moment this is not stable enough (I can do a test run if you like, or you can check out some results from earlier runs).
Shall I use the category Category:Unidentified people (bot tagged) instead of Category:Unidentified people?
I agree completely with you; the bot should gather as much information as possible. So what do you exactly think of by mentioning "color-categories"? The idea with distinguishing photos from diagrams will be covered with BoW once this works (but needs training with a good dataset).
The faces and eyes detected are framed now on the result-page. What whould be the best and most easy way to do this on the file description page also?
Thanks so far and greetings!! --DrTrigon (talk) 20:36, 21 April 2012 (UTC)
  • I think will be good idea to use regular wiki-markup for tables instead of template.
  • Dark green background text is definitely not good for reading. May be small icon could be used instead of changing background for entire cell?
  • Please use work category instead of cat.
  • Please list added categories in edit summary.
I think Category:Unidentified people (bot tagged) should be used to distinguish human guesses from bot ones :-)
EugeneZelenko (talk) 14:51, 22 April 2012 (UTC)
  • As far as I can see "regular wiki-markup" won't work because of the Information template needed. "HTML syntax" should work too, but is somehow ugly... ;) What do you think?
  • The other 3 points are implemented now, thanks for your hints!
  • Regarding the category to use, my intention was to use 2 categories: Category:Unidentified people and Category:Categorized by bot instead of Category:Unidentified people (bot tagged). In future there may be more categories for the bot to work with - then we would have to create for each category an additional one "(bot tagged)". By using the additional category "Categorized by bot" we would have to create one more category (for all bot categorizations) instead of double the number of catgegories and thus handle all possible combinations that might appear in future more flexible. Furthermore the "Unidentified people" are not split up into several categories but kept together... What do you think? Greetings --DrTrigon (talk) 18:42, 23 April 2012 (UTC)

Hello DrTrigon, thanks for marking the faces and eyes; it's now easily visible where the bot fails and where it detects faces properly.

  • If the face takes a big portion of the image, it's perhaps not wrong to add Category:Portraits and if the face fills the image, Category:Faces.
  • I know that our color-categories are everything but bot-friendly, e.g. Category:Black, red, yellow; Root seems to be on Category:Colors by name. Perhaps you could find a working system... Google also offers some colors in their image-search engine.
  • I think the information about face-positions on file-description pages should be added using an own template/ own templates to allow easily reading the data and custom markup. Perhaps we want to hide them later or a JavaScript will hide them and transforms into image-notes.
  • {{check categories}} should either contain the date or look like {{Check categories|year=<year>|month=<month>|day=<day>|category=[[Category:Categorized by OpenCV-bot]]}}. This way, if the user clicks on check categories now either on HotCat or the template, the bot-category (that isn't important in this case anymore) is automatically removed. In my view it's preferable use the specific bot-technology instead of the generic [[Category:Categorized by bot]].
  • Suggestion for the edit-summary on file-description pages: Bot: +[[Category:]] detected using OpenCV
  • All above text is only a suggestion. -- RE rillke questions? 18:18, 26 April 2012 (UTC)
Hello Rillke!
You don't have to thank... especially for such good suggestions from your side! In fact the frames to mark help a lot with debugging the code and was something I was considering already. ;))
  • I implemented the usage of Category:Portraits, Category:Faces now - in fact the current algorithm does face detection thus Category:Faces should be the first category to add. Category:Portraits is added if the face covers more than 25% of the image area.
  • The color categories found at Category:Colors by name are assigned to RGB colors. Then using w:en:Color difference#CMC l:c (1984) in 1:1 (typically used to more closely model human percetion) the color closest to the average image color (given by histogram) is chosen. Hope this is ok?
  • The idea to use an extra template to add such information is new to me. Could someone more familiar with commons (and what whould be a good design considering future usage) inititiate something here?!
  • The one about {{check categories}} is something I was considering already, and was done now.
  • The edit summary is again something I was considering already ;)) but I try to avoid the term 'openCV' since then I am somehow forbidden to use other methods. OpenCV definately is a very important part, but might not be the only one in future... (and in fact color categorization now is done by plain PIL routines and another module called 'colormath' which I should also give credit here!!)
A few new bot runs were done in order to show the changes and the actual format.
Thanks A LOT for all your suggestions!!! Thanks again and Greetings --DrTrigon (talk) 21:52, 27 April 2012 (UTC)
To be specific instead of the generic what about [[Category:Categorized by DrTrigonBot]]? Greetings --DrTrigon (talk) 16:19, 28 April 2012 (UTC)
ImageAnnotator support added now. The detected features like face and eye(s) are annotated by the bot. Is there a way to use different color for different annotations? Or just one color per page? Greetings --DrTrigon (talk) 10:03, 1 May 2012 (UTC)
Hello DrTrigon,
Nice to see this implemented so fast. A few remarks:
{{facedetection
| face1 = {{face
|  left   =
|  top    =
|  height =
|  width  =
|  eyes   ={{eye|left=|top=|height=|width=}}
}}<!-- End face1 -->
| face2 = {{face
|  left   =
|  top    =
|  height =
|  width  =
|  eyes   ={{eye|left=|top=|height=|width=}}
}}<!-- End face2 -->
}}
Again, thanks for working on this important topic.
-- RE rillke questions? 00:08, 3 May 2012 (UTC)
Hello Rillke!
  • Yes you mentioned above that Category:Faces do contain even more face than portraits which is very non-intuitive for me ;) but anyway that is something that can be changed easily.
  • Now the thing with the color categorization as you mentioned will become a lot more complex; my intention was to get the images average color for use e.g. in a mosaic (see example). If you like to get the major patches (as recognised by human eye!!) and multiple of them, this will get a lot more complicated. If you insist on having this function I have to switch this categorisation mode off instead (until I found a solution, which can take an arbitrarily long time... ;)... sorry for that.
  • Concerning the template: the use of Image-annotations does solve this problem too and is crucial to me. Now if the bot is not allowed to add such annotations that would be a major drawback and a big mess!!! How should the bot be able to give more information than "face" or "eye" which is in fact not trivial to get (for human eye yes, but even then you would have to specify the borders...) so this is a big problem for me (or even more than just one)! I will have to talk to User:Lupo...
  • Concerning the Category:Categorized by DrTrigonBot I would prefer to specify and explain in the documentation what techniques were used in order to do the task. May be add another Category:OpenCV or something like this to specify, but the bot should be free to use whatever it needs.
What about the bot flag? To early to ask at this point? ;)
Greetings and thanks for the reply --DrTrigon (talk) 11:44, 3 May 2012 (UTC)

┌─────────────────────────────────┘
From my experience the bureaucrat (there is only one dealing with bots currently (EugeneZelenko)) will first close this discussion, wait a few days and then assign the bot-flag. This, of course is often less convenient for the bot-operator but better than if nothing would happen ;-) -- RE rillke questions? 17:52, 3 May 2012 (UTC)

I'd like to see template usage fro bot information in test run. Color categorization looks useless for me. --EugeneZelenko (talk) 14:49, 4 May 2012 (UTC)
  • Regarding color categorization I was lucky and found Automatic Categorization of Image Regions using Dominant Color based Vector Quantization which mentions JSEG and GLA. Then I was lucky again to find JSEG project (which I hopefully get the permission to use it here... *fingers-crossed* ;) and wrapped this C code into python - short: bot bot is now able to do image segmentation and then assign colors to each segment instead of the whole image. Second, the colors detected (in the segements) are accepted if they fullfil some criteria only. They have to be close enough to the color and the segement has to cover at least a given percentage of the whole image. This should come very close to what you suggested - let's see how it works... else I will shout it off.
  • Then the template; I contacted Lupo to work this out. I think it would be elegant to have 1 template that does annotations and gives a structure that can be parsed. Concerning to "see template usage for bot information in test run" - despite some increased complexity because of the structure - it is the same as the annotation templates already used e.g. in this (at Line 37) our am I missing the point?
    Information field value parameter could be template itself. Now it even doesn't use wiki-table mark-up. --EugeneZelenko (talk) 15:01, 5 May 2012 (UTC)
  • Meanwhile I will run some tests with report to result-page only (once we agreed on the format on file description pages, I will go over those the bot edited already and update them to look all the same - by hand). I would be very happy I we can find the final format quite soon... ;))
Greetings --DrTrigon (talk) 10:57, 5 May 2012 (UTC)
Addendum: Lupo mentioned someting intressting; what about adding the image annotations to the files talk page in a thumb? Is a clever idea in my oppinion. ;) Greetings --DrTrigon (talk) 13:09, 5 May 2012 (UTC)
Thanks for improving the color categorization. Since MediaWiki search engine won't support this, I think it's useful, especially when we'll be allowed to search incategory (limit search result to a category). Google image search has this feature and I've used this multiple times. (I need a flower that is red.)
As for the image notes: Why not using one template that has the following features:
  1. Expose a similar HTML (hidden) like ImageAnnotator. An opt-in JavaScript could run before ImageAnnotator is loaded and transform it into what image-annotator will expect, thus image-annotator will add the notes. This way, you get the annotations on the image only on an optional basis.
  2. Expose machine-readable HTML for the faces/eyes and the color and all other stuff your bot finds. We are currently doing this with HTML nodes (div, span, p) having class or id attributes. An example is the license: Commons:Machine-readable data#License information
  3. Expose the visible interface of in information-field like requested/suggested by EugeneZelenko. Having special classes there allows users to style, turn-on/-off this field.
  4. Scalable for future use: A special template for your bot / algorithm allows you later, if the software and possibilities change, to adjust the template.
I would be glad if you would allow me helping to implementing 1.
Kind regards -- RE rillke questions? 15:59, 5 May 2012 (UTC)
Thanks for working on this. I like the general idea.
Personally, I don't think it's a problem that the bot adds "face"/"eye" annotations to images. They seem explicit than the mere template.
Btw, is it faisable to work through Category:Ships by name and sort images into Category:Red ships (bot categorized), Category:Green ships (bot categorized) etc? --  Docu  at 13:43, 5 May 2012 (UTC)
@Docu: Regarding the question with the ship catgeorization - technically speaking "yes this is possible, but needs some changes on the bot code" (no big deal though) now the "but" part; I think this would involve a new bot request or at least this one to be finished... :))
@Rillke: It sound like you were sitting down and thinking quite a lot about this - thanks! To me your ideas sound good, but I think I do not understand them all and in full beauty - thus I will need your help with all 4 points not just point 1 (sorry for that). I think it could be quite useful if we could create an example page on how this should look like - we have discussed here a lot but we need an example everybody can comment and finaly agree with...
Thanks to everybody involved and greetings --DrTrigon (talk) 20:44, 5 May 2012 (UTC)
Please consider User:DrTrigon/Template:BotCatNote. Please modify it as you wish, to show me the format needed desired/needed. --DrTrigon (talk) 15:33, 7 May 2012 (UTC)
I will. Also started {{FileContentsByBot}} and will continue soon. Everything will documented. -- RE rillke questions? 17:10, 7 May 2012 (UTC)
Now the functionality is fully implemented. I am going to document it. Example at File:2011 Helene Xuan Chaire Forum transitions demographiques economiques Fondation du risque.jpg, File:2011-2012 SUSU Officers Notice Board.jpg. Feel free to adjust it to your needs/ tell me what I should change. We should definitely add i18n; this could be done with "tag"-templates (translation of one word served by one template that is using {{LangSwitch}}) but let's wait whether this is accepted. I also saw the new files on User:DrTrigon/Category:Unidentified people (bot tagged) and was wondering why there is no guess for blue or red, e.g. for File:205 pintura.jpg, brown was checked on RGB(206, 49, 44). And perhaps you mange to emphazize the colors in the center over those on the borders. The bot is right: On File:2067 副本.jpg there is more black than Turquoise/green but a human would do it differently. Perhaps a higher threshold for black would
But now a big thanks! The color-categorization has been improved a lot. -- RE rillke questions? 19:25, 8 May 2012 (UTC)
✓ Done. I hope I didn't miss anything. Please let me know if this is the case. And I understood now why the guess was brown (150,75,0) Δ126, not red (255,0,0) Δ142. But perhaps something to "improve" in future. -- RE rillke questions? 21:56, 10 May 2012 (UTC)
This looks great! Thanks a lot for you big effort!!! :) I had some time to understand the template (and to be honest I am not sure if I succeeded... ;) but I like it; there are no annotations (as whished) but they can be optionally switched on (which I like even more! :) - good work!
So I have to ask this question now; did you have a look to User:DrTrigon/Template:BotCatNote? From bot's technical site, I would prefer a template syntax that is more generic like
{{FileContentsByBot|id=1|cat=Unidentified people|bot=DrTrigonBot}}

{{FileContentsByBotValue|name=Confidence|val=0.75}}
{{FileContentsByBotValue|name=Eyes|val=[(1047, 795, 188, 188)]}}
{{FileContentsByBotValue|name=Face|val=(877, 625, 711, 711)}}

{{FileContentsByBotEnd|id=1}}
or something similar (if possible?) - there should be one generic way to handle all possible kinds of entries. That way I would not need to add new templates every time I introduce a new categorization mode in the bot.
Regarding the color assignment issue (let's call it like this ;) We have a very restricted set of color categories here in commons. I took them and assigned "the most" simple color I could think of (which might be not optimal) according: Black: RGB( 0, 0, 0), Blue: RGB( 0, 0, 255), Brown: RGB(165, 42, 42), Green: RGB( 0, 255, 0), Orange: RGB(255, 165, 0), Pink: RGB(255, 192, 203), Purple: RGB(160, 32, 240), Red: RGB(255, 0, 0), Turquoise: RGB( 64, 224, 208), White: RGB(255, 255, 255), Yellow: RGB(255, 255, 0). Confer this table (the lowercased ones). So it could be worth tuning (re-defining) this color table and in a further step enhance the number of color categories (and thus the coverage of the color space) here in commons.
Greetings --DrTrigon (talk) 15:20, 11 May 2012 (UTC)

┌─────────────────────────────────┘
Hello DrTrigon, having generic templates is nice when having a lot of changes but is not nice to provide a stable interface (machine readability) or when attempting to change the markup. Then, it is likely to have situations like "I can't change this because this would break if generic parameter val is ...". If we would for example decide to pass dimX and dimY also to each face position, there could be more magic:

500px

This just for example. Passing single parameters instead of whole strings enhances the possibilities. But we can use what we basically have and create more generic templates for this:

  • Areas: x,y,w,h(perhaps also passing full-size dimensions to have more possibilities)
  • Colors: R,G,B
  • Generic table rows for further development.

Do you think this would be ok? Just add if something is missing.

As for creating new color-categories, perhaps a note at Commons:Categories for discussion could be useful to get input of people who often deal with them.

And finally a question: If Multichill's bot categorized a file with data from common sense, will your bot run on this file? I think bots using different techniques to find categories should not exclude each other (that's just one reason why thinking of a category containing the used bot-technique). I know that, for example User:MGA73 also categorized some files by his bot using the file usage. -- RE rillke questions? 19:15, 11 May 2012 (UTC)

There are at least 2 problems I have with it:
  1. I'd like to have a flat hirarchy, an object for face, one for eyes, ... making a tree-like hirarchy, e.g. eye is child of face is nice because of the context but not for writing bot code since it increases the complexity by gaining not that much of information
  2. It is not machine friendly at all, e.g. not for creating and writing the structure - it increases the complexity and by that the chance of bugs appearing a lot
  3. Having the lot of templates to manage, make me feel I will write template docus most of the time instead bothering about a working bot algo
  4. I did not talk to User:Multichill yet, but sent a notice now: User talk:Multichill#OpenCV (and others) automatic bot categorization
To be honest I do not have that much time. So I am thinking about to withdraw this bot request. If you have that high quality requirements; I think I won't be able to satisfy your expectations. I was used from other wikis to start writing and using a bot in a very carefull, decent and modest way and then step by step improve it. This is possible by small efforts from time to time. But now I will spend hours and hours creating a parser code, docus and templates... having a big effort just for the output of some info that appears by the way (I mean it is intressting but not the main job of the bot at all) ... so at the moment I am baffled... Greetings --DrTrigon (talk) 09:21, 12 May 2012 (UTC)
Then, there must be a misunderstanding. My comments were suggestions not requirements. Simply create the template(s) you need and launch it slowly. (Multichill also does not care about this bot-approval process at all) But you will have to stop it if people come an complain or simply disagree. Documentation can follow later on but at least there should be a name/value-pair in a custom template that allows scripts to evaluate the contents. Doing it one time properly or even perfect can save time later and prevent trouble, right? In the end it's you decision whether you have the time or not. Thanks for your valuable efforts. -- RE rillke questions? 09:43, 12 May 2012 (UTC)
Indeed... The point is I'd like to fix all the stuff in order to apply one unique style to all pages (also the ones the bot already processed) - thus I would like to get a good and clean (but has not to be perfect and fully sophisticated yet) solution now in order to go on with testing, checking and improving.
  • Is there a chance to get the same template structure e.g. as annotator as block with {{FileContentsByBotEnd}} and {{FileContentsByBot}} templates? What are the drawbacks, reasons for not doing this?
One solution might be having a 'value=' in {{FileContentsByBot}} and the "generic" class mentioned below. Then multiple templates in the 'value=' entry and multiple {{FileContentsByBot}} on a page have to be allowed also. In this way the block structure would be possible without needing the {{FileContentsByBotEnd}}. I added an example in order to understand what I try to explain... ;)
  • Is there a chance to get rid of the hirarchy and have e.g. eyes and face on the same "level"? What are the drawbacks, reasons for not doing this?
  • Regarding the argument of "single parameters instead of whole strings" is there a chance to have a "generic" class that can be used with whole strings as I mentioned and then later (e.g. if the test show it to be useful) make a specialized template for this param, as you mentioned? E.g. the generic template could include a hint with link in order to instruct all users reading it to create the specialized template? ("this template does not exist yet. please help to improve commons...")
So it would be cool if we could separate between must have now (a minimal and extendable interface) and nice to have (all functions this inferface should offer at the end). I would like to stick with the "must have now" part at the moment - once this is done and we can all enjoy it working (or complain about bugs ;), we can go a step further. Thanks again for all your help and greetings --DrTrigon (talk) 13:28, 12 May 2012 (UTC)
I am trying to get this working on User:DrTrigon/sandbox could you please help me there? Thanks and greetings --DrTrigon (talk) 08:49, 17 May 2012 (UTC)
I will be busy with POTY at least till tomorrow and then, let's see :-) -- RE rillke questions? 10:47, 17 May 2012 (UTC)
I desperately need you help... ;) I did some changes and added a few templates in order to approximate the concepts I have in mind, please have a look at File:2011 Helene Xuan Chaire Forum transitions demographiques economiques Fondation du risque.jpg and User:DrTrigon/sandbox. Now the structure and templates I have created and modified have to cleaned up in order to work with javascript, annotation and machine readable code... I don't think I am able to do this by myself... Despite this I think it should work and be possible to adapt all your templates and concepts too! One problem I see is we might have multiple annotations to same face and eye - is this a problem to solve/handle? Greetings --DrTrigon (talk) 19:17, 19 May 2012 (UTC)
...the other important question (may be addressed to EugeneZelenko); is whether this template (system) syntax fullfills the needs and what else would be needed in order to get the flag? All other (cool!) hints will then be added one by one as future improvements... greetings --DrTrigon (talk) 19:15, 21 May 2012 (UTC)
I think will be good idea to make test again to see how good changes are. Last bot edits are from May 1. --EugeneZelenko (talk) 14:42, 22 May 2012 (UTC)
Thanks for that feedback! I will do more edits as soon as the template in the current state works as it should, there are 2 minor issues at the moment:
  1. the width of the 1st column is not always the same as in {{Information}} (15%) which is not that nice ;)
  2. the javascript and/or template should be fixed in order to work together again (which I am not able to do by myself...)
after that fixes I will enjoy to run the bot again!
Additionally I created User:DrTrigonBot/ToDo as a try to keep track of the keypoints discussed here. From my point of view the important points are TD-001 and TD-002 in order to be ready for the bot flag. As soon as possible I will then do TD-003 and then I will be very happy setting up the bot to run from the toolserver and resolve all other TD-???s in order to converge agains a nice, cool and powerful bot-code (thanks to all your help here!) I think the best would be to focus here on TD-001 and TD-002 and discuss the other stuff on the ToDo-page. What do you think? Does that sound reasonable? Thanks and greetings --DrTrigon (talk) 13:20, 23 May 2012 (UTC)

Hopefully POTY can live without me now ;-) so I find some time this evening. Just want to let you know I found some positive feedback and an Idea for new feature requests: Commons:Help desk#just a compliment. -- RE rillke questions? 15:50, 28 May 2012 (UTC)

Cool, thanks!! ... (but) how is this related to the bot? Sorry, I just don't see the relation... :)
So did you manage to spend some time trying to solve my issues? Thanks for any feedback! Greetings --DrTrigon (talk) 18:03, 30 May 2012 (UTC)
Today I did quite a lot of changes to the bot code and the templates... finally converging against the grouping given by Rillke and the syntax given by my bots needs... ;) There is still a lot to do... I think in few days I should be able to run the bot with complete and fully working template. I was also able to get the javascript working again, but I think there are some changes needed also. We will also have to agree on all label names (create some consistent naming) and go over all templates to double check them... That's for now! Greetings --DrTrigon (talk) 20:05, 4 June 2012 (UTC)
Now I did the last changes and started to update the docs in order to catch up with the developments. Now the bot is doing a new test run (finally! ;) in order to provide some results to check-out. Any comments are very welcome! Thanks a lot for your patience! Greetings --DrTrigon (talk) 16:00, 6 June 2012 (UTC)
I like that it found black and white at File:215988 122079207869325 100002017402194 152849 7637472 n.jpg.
Maybe for futher development:
  1. Before running the color-detection, crop 25% of the image in order to give the bot a more human eye.
  2. I didn't look how it computes the color but at File:241297 912292247137 27609511 42477872 5166953 o.jpg, both the confidence and the coverage is low. I guess your bot is using a threshold for each single value; maybe adding a threshold for the product of both could be also helpful.
If you wish changes in the JavaScript, just tell me what to do. Thanks for your hard work. -- RE rillke questions? 16:53, 6 June 2012 (UTC)
In fact the 'crop' idea sounds very reasonable - this is why I put it on the todo list as TD-017 - thanks for this!
In short the 'Confidence' is calculated from 'Coverage' and 'Delta_E' as follows:
  • (Coverage >= 0.40) and (Delta_E <= 5.0): Confidence = 1.0
  • (Coverage >= 0.20) and (Delta_E <= 10.0): Confidence = 0.75
  • (Coverage >= 0.10) and (Delta_E <= 20.0): Confidence = 0.5
  • otherwise: Confidence = 0.1
then a threshold for 'Confidence' (of 0.75) is applied only. Any ideas here?
Regarding the JavaScript we have to agree on all label names (create some consistent naming) before changing them to avoid a big mess... ;) I think we might choose shorter ones e.g. alle the 'detected-by-bot' look unhandy... May be we should create some tree and the root is 'detected-by-bot' but avoid to have it re-re-appearing all over... :)
Whan about the syntax, the look and feel, usage, ... any serious misstakes or other issues? Any ideas for improvement? Is it good to be used in productive environment (despite improvements that will definitely take place)? Here I have to thank to Rillke again - it took some time but I finnaly got your concept and you convinced me... ;) thanks for your work! Greetings --DrTrigon (talk) 20:21, 6 June 2012 (UTC)
{{FileContentsByBot/Properties}} just repeats EXIF/MediaWiki information. Isn't image format and dimensions are supposed to be avaliable in MediaWiki database?
I still could not understand purpose of adding color category in cases like File:241297 912292247137 27609511 42477872 5166953 o.jpg, File:2392x3296 Malay Rice Dumpling.jpg, File:24 1926June large edited.jpg, File:21eCIE.jpg. I think leaving such files uncategorized is much more useful.
EugeneZelenko (talk) 14:48, 8 June 2012 (UTC)
It's available in the database. But not immediately for JavaScript or not in a machine-readable format (just change the language and you get totally different formatting in the UI). This is, by the way the reason why ImageAnnotator also adds parts of this information. The dimensions are required, at least. -- RE rillke questions? 15:08, 8 June 2012 (UTC)
The idea mentioned by Rillke earlier (to give the bot a more human eye, by cropping 25%) solved immediately the first and last categorizations, then applying a slightly more restrictive value for coverage (0.25 instead of 0.20) the 2nd could also be solved. But images like File:24 1926June large edited.jpg will still remain a "problem"... (since the coverage of at least 95% of white is given). What do you think? Is this a tradeoff you can accept? Greetings --DrTrigon (talk) 16:45, 8 June 2012 (UTC)
Sounds good. Do we have to be "better" than Google? -- RE rillke questions? 17:08, 8 June 2012 (UTC)
I also like that the bot adds the color mode (RGB). This information is AFAIK nowhere available and should be definitely kept. I think we are all aware of the CMYK-bug of ImageMagick and therefore of MediaWiki. The template could detect, if the bot sets CMYK and link to Help:JPEG#color mode, .... -- RE rillke questions? 20:15, 9 June 2012 (UTC)
'Mode=CMYK' is detected in the template now and linked... Thanks for the hint! ;) --DrTrigon (talk) 16:25, 12 June 2012 (UTC)
@Rillke: Thanks for pointing this out - good point...! ;)
@Eugene: Regarding color categorization: I did another run yesterday and I think you will be even more unhappy (look e.g. at the 'White' categorizations)... So my question is what is it exactly what you dislike? Do you want to have categorized objects at the center of the images only? I do not get your wishes/ideas/concept by now... can you be more specific?
My personal view is that the bot just gives hints what categories could apply for what images. This is the reason why a {{Check categories}} is added also. Therefore the bot is more inclusive than exclusive by tendency (in fact it drops a huge number of color regions, there are more than 10 in most cases).
I did some tests today also by including position info for the color regions additionally, but I doubt the results do satisfy you. Thus I can switch this bot part off for the moment, if you like (until I was able to improve it to a point that passes your criteria :). As far as I understand the face recognition part is ok?! What do you think? ;)
Greetings --DrTrigon (talk) 19:30, 9 June 2012 (UTC)
The face categorization looks fine for me, and I Symbol support vote.svg Support approval, at least for this task. I suggest we split-up color categorization immediately to Commons:Bots/Requests/DrTrigonBot2 (for bureaucratic reasons :P ) -- RE rillke questions? 20:15, 9 June 2012 (UTC)
On my opinion color categorization is misleading and useless. At least for images which I looked to. I don't think it's good idea to complicate life of those who will review categories and then will been to delete color ones. --EugeneZelenko (talk) 14:39, 10 June 2012 (UTC)
Yes that is what I understood, but what would you like to have in color categorization? Just look at the very central region, if there is an object of any color since the background is not important? Or just have a color that matches close to the reference color? What is it exatcly that you do not like? I will never be able to improve it if you cannot specify what you whould like to get... In the meanwhile the categorization based on colors is switched off. Would you still like to get the information what color were found? Or shall I switch this off too? Thanks and Greetings --DrTrigon (talk) 15:31, 10 June 2012 (UTC)
(...and what about color categorizations "by request" as e.g. the ones for User:Docu in Category:Ships by name...? Would that be ok too?) --DrTrigon (talk) 23:11, 10 June 2012 (UTC)
It's hard to tell what color categories are good for, but I don't think that this bot (in current state) could improve things there. More exact definition of problems (if examples were not enough): color taken to category was not even predominant on image, or of background instead of object. Anyway color is not main property of content. It may be nice to have color categories in some cases, but only after subject ones. --EugeneZelenko (talk) 13:56, 11 June 2012 (UTC)
(Most) of the colors were predominant in fact but at the same time of background (that is correlated - one of the bad things there ;). So as I understand you are talking about color categories in general (not related to this bot only). As I can see the categorization done is heavily based on human perception, confer e.g. File:Leucopogon esquamatus 2.jpg which has just a very small part containing white, but valued as the ROI. Another one is File:2008 Taipei IT Month Day9 Besta CD-868 in white.jpg with a ROI conatining a lot of black... So it is heavily based on the context the file is used in... may be a task for User:CategorizationBot...?

arbitrary break

Concerning the label names, IDs must be unique, otherwise it's invalid HTML. Class names can reoccur but we should also avoid a clash with totally different classes, e.g. from MediaWiki. But shortening should be possible. And consistent and intuitive naming is, of course, important.

I am happy with the look of the templates and the use of them in Wikitext.

I will definitely spend some time to look into the open questions. Regards -- RE rillke questions? 10:50, 7 June 2012 (UTC)

Very, very, very cool!! Thank you! That are good news to me!! :)
Concerning the uniqueness of labels in ID vs. class - I was not aware of this - thanks for the hint!
Concerning the shorter, consistent and intuitive (good point!) label names; I would propose:
  • replace "detected-by-bot" by a simple "bot" and place it in front, e.g. change "color-region-detected-by-bot" to "bot-color-region" (avoid splitting up e.g. "color-region-name" or else)
  • in general use the same label as the bot does and the templates are called (except the indiced ones), e.g. change "bot-color-region" to "bot-color-regions" (but keep "R", "G", "B" instead of 00-02)
  • "fileContentsByBotDimX" should may be become "fileContents-ByBot-DimX" or more consistent "Bot-fileContents-DimX" if that does not conflict with MediaWiki labels or labeling concept... (may be we should also "respect" those of ImageAnnotator ...?)
What about other comments? What else to consider? Eugene? ;)
Thanks again and greetings --DrTrigon (talk) 14:50, 7 June 2012 (UTC)
So the face detection based part is ready to use and check e.g. 100 pages a day, every day. Once the admins on the TS have installed all libraries needed to run the bot it will do so from there. Color detection is used to gather information only not (!) for categorization.
Now we have to finish the minor improvements in the template, e.g. javascript could mark the color regions too... ;))
Greetings and have a nice day! --DrTrigon (talk) 16:25, 12 June 2012 (UTC)
Feel free to shorten the class-names. I will adopt this in JavaScript. Color regions are now also marked, while a rectangle is not really a good choice. -- RE rillke questions? 20:33, 12 June 2012 (UTC)
Can be confusing... I am aware of that... ;) What else (instead of rectangles) would you suggest? What else can you do with javascript? Polygon (e.g. with max. 10 corners)? Else?
I like the colored notes you have added very much! Thanks a lot for all your fast and great work!
Btw.: regarding the point with 'id' and 'class'; why did you use 'id' at all? Is there any deeper reason? Here I think having them to be unique is more a drawback than an advantage... What do I miss here? Greetings --DrTrigon (talk) 21:55, 12 June 2012 (UTC)
Older versions of IE (including IE8) can't natively getElementsByClassName while ID-lookup is possible. Also, ID-lookup is faster (more cycles per second), I think. The script is doing the following:
  1. Retrieve the node with id fileContentsByBot
  2. Working from this node through the children instead from the document root.
I used the ID also to enforce a design-spec/requirement: Don't place duplicate templates "FileContentsByBot" on a single file description page.
If there are IDs that aren't required (nested IDs) they can be simply dropped.
As for Polygons, I found only this one. Of course SVG (generated by JavaScript) would be also possible. -- RE rillke questions? 08:10, 13 June 2012 (UTC)
Ok, thanks for the lessons! ;) I have to look at all IDs and classes and try to fix any messes I've generated there... :)
Regarding the ColorRegion marking; I've introduced a new detection: people (not only faces) and would like to mark them too, if possible... ;) At the same time I think it would be a good think if we could configure what marks to show... A global default and a per user based configuration as you already introduced for the whole template would be good here too... what do you think? ;) Thanks a lot and greetings --DrTrigon (talk) 14:14, 17 June 2012 (UTC)
Some people like if they can config things (maybe script could add checkboxes to the description column of the table). If it has appropriate classes, I think it's no problem to tell the script what to show. I am also thinking about not using image-annotator anymore and using custom drawing instead. Shouldn't be too difficult. In this case, you can specify as many points as you like :-)
Identifying people sounds interesting but should be improved (File:A&V (7).jpg) -- RE rillke questions? 20:02, 17 June 2012 (UTC)
Yeaa... yesterday was somewhat a bad day for the bot; a lot of troublesome examples... and indeed people detection needs some fine-tuning... ;)
Checkboxes sound intressting if there is still some permanent default every user can set. The ImageAnnotator has the advantage of looking familar and drawback of suggesting the 'edit' option that do in fact not work (in that case). Beeing able to draw polygons might be an advantage but it will need a lot (2 times # of points) of point data to be added to template... ;) So a simple rect needs 8 instead of 4 values. --DrTrigon (talk) 12:24, 18 June 2012 (UTC)

File:A young Coyodog. Coyote- Husky Mix Costa Rican "Avellana".JPG - hehe. -- RE rillke questions? 20:02, 17 June 2012 (UTC)

Something for the category "is it a feature or a bug?" and deleted anyway meanwhile... (a pitty somehow ;) --DrTrigon (talk) 12:24, 18 June 2012 (UTC)

Bot flag for edits?

Stupid thought I came up with yesterday; Would it make sence to use the bot flag for edits at all? Despite the fact that flagging the bot is a good thing in order to show the community's support for the bot (request here was passed and so on... ;) - the bot makes edits that might be noticed by human users since they should be checked (at least a brief glimpse). So after flagging the bot it should still do edits without flag. What do you think? Greetings --DrTrigon (talk) 17:56, 21 June 2012 (UTC)

I agree that that they don't have to be flagged as bot edits. CategorizationBot also didn't flag its edits. -- RE rillke questions? 12:25, 22 June 2012 (UTC)
So... whats up? What is the current status? Everybody satisfied? Currently I am... :)) Some modifications on the template and the javascript still have to be done (as soon as I have some time for this)... But that should be all for the moment... Greetings and have a nice weekend! --DrTrigon (talk) 09:36, 29 June 2012 (UTC)
As far as I see from contributions, bot doesn't add color categories. It was my remaining objection. --EugeneZelenko (talk) 14:40, 29 June 2012 (UTC)
Yes this is correct, I turned off the categorization based on colors (information about color regions are still added) according to your objections... ;) ...and I started using the w:en:RAL (color space system) w:en:Pantone in order to be able to give better color descriptions. Greetings --DrTrigon (talk) 09:11, 30 June 2012 (UTC)

I there are no any objections, I think bot should be approved, and bot status granted. --EugeneZelenko (talk) 14:53, 30 June 2012 (UTC)