User:Inductiveload/Sarang bot work

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page is for discussion of a bot request from de:User:Sarang. Anyone else should leave messages at my talk page.


Hi Inductiveload[edit]

Hi. I am very happy that you gave me an answer. The work I want to be done by a bot can be done by some rather stupid editing, and the two first requests need just something like 2 to 4 hours. But as a user of the advantages of data processing it seems to me more errorfree to let such simple editing to a bot.

My English is a bit poor, but hoping that you may understand everything it may be better than to talk in German, even if you can understand that as well.
For that I am looking for a bot:

1st request
Creation of categories: As you may swiftly see, I did the first half, and need the second from "Category:Radical 108-0" to "Category:Radical 214-0" (130 and 184 are existing, but it does not matter whether they are ignored, overwritten or deleted and new-created). In any form you can use it best I can deliver you the text for each of this categories; either as a whitespaced text in this page like this example:
or as a txt file attached to an e-mail, or whatever; and in any form you require.
2nd request
Creationchanging contents of categories: 214 categories "Category:Radical 001" until 214 are existing, but we need them another way. Either they may be all deleted and new generated, or its content overwritten — whatever is less work. Regardless of their previous content, each category is to be initialized by a template invocation like {{Radicalw|''character''|'number''}}. You may see it at the first three categories (some categories need additional interwiki parameters, I can do this later, or I deliver the complete 214 text strings of Radicalw to you). The character is the Chinese Radical of that number (as 一 丨 丶 丿 乙 亅 二 亠 人 儿 ...), and the number (it seems rather difficult to extract it from the PAGENAME) goes from 001 to 214, with or without leading zeroes. Whatever you like more, you may get the 214 test strings, or just the 214 characters, in any form. But:

For this and for more future work, it would be the best if your bot will be able to work from two tables (or two elements in one table), one giving the name of the category to be created, the other their initial text, somehow like that:

cat name cat content
{{CJK category|211|0|U+9F52}}
齿 {{CJK category|211|0|U+9F7F}}
{{CJK category|212|0|U+7ADC}}
{{CJK category|212|0|U+9F8D}}
{{CJK category|212|0|U+9F99}}
{{CJK category|213|0|U+4E80}}
{{CJK category|213|0|U+9F9C}}
{{CJK category|213|0|U+9F9F}}
{{CJK category|214|0|U+9FA0}}

When your bot has proofed its usability, there are about 1000 more categories in need. This we may discuss later. If you tell me, if this catgen has troubles if it wants to to create which is existing, I can think it over.

Hoping that we come to a good progress, best wishes to you and to your bot -- sarang사랑 08:45, 1 February 2010 (UTC)

Request 1 (Radical0)[edit]

That sounds quite possible. If you could just paste the text into Lists I can get on that right away. Having a template add the categories is generally not a good thing, because the categories don't show up in the source code of the pages, making it hard for bots and scripts to act on the page. While I'm doing it with a bot, I can add the categories with no extra effort as the template, giving the following code:

{{Radical0|皿|108|5|mǐn|76BF}}

[[Category:Radical 108]]
[[Category:Radical-0]]

Explicit categories can be added to radicals 1-107 in the same way, and then the template can have it's categorization removed. Since the radicals are fixed, I don't see how this will need changing once done. Of course, these categories can also be added later in the same manner using the same bot, if you plan to change the category structure. I'm just saying I could do it now, and save 214 changes later.

OK, the bot is ready to start work on this one. Once I have the list of contents of the pages and permission, I can go. Inductiveload (talk) 04:04, 2 February 2010 (UTC)
I just need the same data you gave me already, but for the range 1-214, then I can use the same data for both Request 1 and 2. My bot rips the radical number out of that template string. Take a look at Lists for what I want. Cheers, Inductiveload (talk) 19:35, 2 February 2010 (UTC)
Update: I see you gave the data already. That data is fine, the bot is ready to run over all these categories, pending permission. Inductiveload (talk) 19:46, 2 February 2010 (UTC)
Test edit: Category:Radical 003-0 and Category:Radical 004-0. Please check it out so I know I'm doing it right! Thanks Inductiveload (talk) 14:17, 6 February 2010 (UTC)

Request 2 (Radicalw)[edit]

This also sounds perfectly easily doable, especially if you give me the relevant text for each category (paste it into the page above). Are you sure I can overwrite every one of those pages? I want to make sure I'm not going to stomp on something that shouldn't be stomped on. The page will then be:

{{Radicalw|''character''|'number''}}

[[Category:Chinese radicals]]
Actually, if you give me a full list (1-214 rather than 108-214) of the radicals for Request 1, I can just use that data since I only need the character itself and the number. If you already have that data on hand, it would be easier for both of us. That would then be only a tiny modification of the script to do Request 1. Inductiveload (talk) 04:14, 2 February 2010 (UTC)
For this one, I'd prefer to leave the interwiki links in the template for now, as they are not (as far as I know) so problematic when added by templates (and not used for the ordering and searching of files on Commons). Especially if more interwikis are needed in future, the template can be used to easily add them to all the pages at once, but the category structure won't change like that, so it can be explicit now. The data you gave is fine, and I will do number 1 and 2 for you to check once I get permission. Inductiveload (talk) 19:53, 2 February 2010 (UTC)
Then I will leave everything with the interwikis as it is now.
Sorry, my mistake — the first three radicals are done; better do number 4 and 5! -- sarang사랑 22:54, 2 February 2010 (UTC)
Test edit: Category:Radical 003 and Category:Radical 004. Please check it out so I know I'm doing it right! Thanks Inductiveload (talk) 14:17, 6 February 2010 (UTC)
I saw it with joy, and answered at your talk page. -- sarang사랑 17:21, 6 February 2010 (UTC)
The interwike linkage of the Category:Radical ### pages is very rare, like that:

(whitespaced)

Request 3 (others)[edit]

For the other cats, tables should be perfectly fine to read data out of. I'd like to do the other requests first and get the structure clear before starting on that, but I don't anticipate it being too hard to do.

OK, it's done. Please check it is correct. And thanks for the tip about CODE2000. That format you gave me by the way is perfect. If you use that for future requests like this, I will be able to do them in seconds! − Inductiveload (talk) 02:11, 10 February 2010 (UTC)
Everything looks good. As far as I can see, the 'bulk creations' are now done, and my work of crosschecking and rectification can start. Thank you for your help!
When I am in a state of some readyness, the changings from built-in to explicit cats will come. Now I do not know whether your tools are the best choice for such a task: adding text to categories. I can offer a similar list structure as for the creation of new cats.

It seems that you wrote & own the only bot that can create categories? When I accidently looked into Special:WantedCategories I saw that there are billions of categories needed for "Images from KIT" (nl:Koninklijk Instituut voor de Tropen). Could be that somebody were happy to know of your tool.... It is really simple and easy to generate a Wikipedia output list, extract the data from it, edit it with Office (Winword, Excel, Notepad) or other tools and create a list suitable for your bot. -- sarang사랑 12:14, 10 February 2010 (UTC)

General stuff[edit]

I haven't got permission to use my bot for category work yet, so it may still be a while before I can start, but if you supply the text, I can write and test the program.

I will then do the first 3 or so categories of each request, and you can OK them before I continue. Then I will do all the others. If you could add comments about each request to the sections above, that will help me keep track of progress! Thanks, happy to be of service, and I hope we can do some good work together! (I study Chinese and I would like to help this project sort out these categories) − Inductiveload (talk) 01:00, 2 February 2010 (UTC)

Generals[edit]

The categorization as you intended is a next step after that all. As long as am checking and cleaning up the different categories for the Chinese characters, I need some more temporary categories (some kind of private maintenance cats), for cross checkings. You may look up Template_talk:Radical0. When all the checking and cleanup is ready, I will remove these additional cats from the templates, and issue the delete requests for the not more needed cats.
I choosed that way, the creation of the temp cats by templates, because I have not bots for that; it may sure be the better way, if such a tool can be used to add and remove cats; adding will be an easy action, removing needs a bit more of programming.

I know that cats-by-temps make some troubles for cat structure analizing tools, and it may perhaps be a good idea to replace this hidden cats-by-temps by an explicit notation you above.
A lot of templates for radicals and CJK-characters work with hidden categories. With your bots, we can think about changing the cats from hidden to explicit.

For both requests the cats are to remain permanently, so you may create them explicitly. Just let it be

{{Radical0|皿|108|5|mǐn|76BF}}
[[Category:Radical 108| 0]]
[[Category:Radical-0| 108]]

{{Radical0|目|109|5|mù|76EE}}
[[Category:Radical 109| 0]]
[[Category:Radical-0| 109]]

(sorting option "space"), because it will be silly-looking have all the 214 entries subordinated to the letter "R". I repeat: the 1st request means that the category does not yet exist, may you provide an error exit if you find the cat existing. The 2nd type of request means that the category should be existing. I suggest that it should be clearly defined what your bot expects, and any mismatch leads to an error.

For the 2nd request, it would be fine when the categorization to "Chinese radicals" is done explicitly; no sorting option is required.

Update: it is the same like above with all the 214 entries subordinated to the letter "R". So it makes more sense to set a sorting option, number preceded by space: «[[Category:Chinese radicals| 001]]» to «[[Category:Chinese radicals| 214]]». Please tell that to your bot! Thanx -- sarang사랑 16:57, 4 February 2010 (UTC)

Another thing is the interwiki linkage. Is there also the explicite mode better? All 214 Radicals should link to their de, ja and zh pages; en, fr, it and others may come later.

example for "Category:Radical 005"
[[de:Radikal 5]]
<!-- [[en:Radical 5]]  not ready -->
<!-- [[fr:Radical 5]]  not ready -->
<!-- [[it:Radicale 5]] not ready -->
[[ja:乙部 (部首)]]
[[zh:乙部]]

Because there are just a few radicals with more interwiki pages, I intended to make these entries manually afterwards, there is no need for a bot.

Because there will be not more than 214 Radical0 and 214 Radicalw entries, the server load for rendering is low, and it does not matter to change the templates prior or after.

Depending the lists of entries for your bot, each entry may consist of two parts,

  1. the full name, i.e.
    1. Category:Radical 005
    2. Category:Radical 005-0
    3. Category:乙
  2. the content, i.e. (leading zeroes are never necessary)
    1. {{Radicalw|乙|005}}
    2. {{Radical0|乙|005|1|yǐ|4E59|乚|乛}}
    3. {{CJK category|005|0|U+4E59}}

and the parts separated by any character or media you want (space is often part of the category name); even fixed lenghtes are easy to me. Just tell me whether such a list design will be fine for you, to avoid too much changing for further requests. Then I will soon load the data into your Lists.

It is very interesting to learn that you study Chinese. I know only few about Chinese, but I understand that categorization needs some work. Sometimes I have some friendly struggles with User:Aotake (of Japan), but I hope we will come clear. When you get some helpful ideas about that matters, they are welcome. -- sarang사랑 13:27, 2 February 2010 (UTC)

Update[edit]

Hi, please consider my last update. Cheers -- sarang사랑 08:26, 5 February 2010 (UTC)

Category:Radical 004 looks good! -- sarang사랑 16:15, 6 February 2010 (UTC)
But what is the use of the built-in link on itself?
More comment: The category tells above (by its title line), right (in the box) and below (with the subcat ###–0), what it is; the box shows also the main character in an enlargened size. The link to its supercat is, as always, at the foot of the page not far away. The variations are not shown there but in the next step, the subcat ###–0.
Sure it is a fine thing to generate automatically a good description. But in this special case it seems of not much use and in no way helpful to anybody. I suggest to remove the description, in this special case. Elsewhere it may be useful, even that it would be better to generate descriptions by the template: easy to insert/maintain/change/remove at one central point, and a bit space saving for Wikipedia.
By the way, I removed the categorizationing from the Radicalw template. Cheers -- sarang사랑 17:03, 6 February 2010 (UTC)
OK, it was just a trial for your consideration! I'll remove it in the next run. Still waiting for permission though... Inductiveload (talk) 19:52, 6 February 2010 (UTC)
Not knowing what you like more I continue at this place; if you have our other communication pages on your watch list, you may prefer it there instead of here?
  1. If you think it a good thing and worth the small effort, I can compose lists of all the cats which should be explicitly categorized. Bots for that are ready and permitted, so you need only to tell me the structure for that lists.
  2. Is it better to insert into Bot list 2 the interwiki params? It depends whether you have your bot run ready prepared — then it is just some minutes to insert them afterwards and manually. But if you work from a list, and take any string from it, preparing the list is less effort (for me, and for the server).
You just tell me sometimes. -- sarang사랑 05:57, 7 February 2010 (UTC)
There went something wrong with the categorizationing Radical ###–0; I repaired it for 000 to 004, where it was just the wrong way.
Another problem: I had some work to set the box for browsing that the arrows are always at the same position for the cursor. Now the box is jumping up and down, but I cannot find the reason. I have that when I enter Category:Radical 004-0 and browse back - at 001 and 003 the box is lower. You may know why? -- sarang사랑 06:25, 7 February 2010 (UTC)
I fixed that category error, sorry about that. A simple mistake on my part. As for why one is lower than the other, I have no idea. I've modified the template to have transcluded documentation as I thought you may have a stray line-return, but it hasn't fixed it. Inductiveload (talk) 16:47, 7 February 2010 (UTC)
Update: It's caused by an invisible line return at the start of the page. I've modified the bot to not put this in. As far as interwiki links go, it's a significant change to insert, it may be better to do it by hand given the relatively low number of edits required. Inductiveload (talk) 16:53, 7 February 2010 (UTC)
OK, the bot has been run over all of requests 1 and 2 - seems to be in order! Please check it! I have also changed the big radical table to a transcluded template, changed the arrows in template:Radical0 to ones that render better (the ones before were squares rather than arrows) and removed autocats in template:Radical0. Cheers, Inductiveload (talk) 19:14, 7 February 2010 (UTC)

Whow — it looks really great, and a we reached a good state of progress. By hand it would have been many hours of work, plus some errors (happens always at stupid tasks).
Fine also that you stopped the jumping of the box; sounds difficult to find & repair such a hidden effect.
You made also something better with the templates — documentation and more. I liked more the squarelike arrows, but I can live with the new ones.
To edit the interwikis manually is no problem, I will do that within the next days.

Now it is possible to continue with crosschecking categorization and fixing inconsistences. Thank you for all your help, and congratulations to your success. I shall not trouble you for the next days, but I am collecting data for the next bot requests. In some cases built-in cats can be replaced by adding explicit ones; do you have wishes depending the data structure for this requests? But there is no hurry at all.

Whenever you like you may perform Request 3 (not yet 3b); and the data of the other requests can now be deleted. -- sarang사랑 11:12, 8 February 2010 (UTC)

Hi! Glad you like it! The squarelike left arrow didn't show for me, and I have a generally "permissive" font set, so an average reader might struggle. If you like you can use an image with a link: like this: [[File:Pfeil links.svg|10px|link=Category:Radical 001-0]].
As for request 3, I'm not sure what you want, as many of the category names are blank. I assume that you want me to place that content on the page for the character, but the bot won't be able to see that character as it's not in its source file! If you could copy down from the other list or something, that would be good, then I can do it as soon as I have time. Thanks -- Inductiveload (talk) 16:08, 8 February 2010 (UTC)
Now I tried to specify RQ3 better understandable (I had first to make myself understood what's the matter): 71 cats are needed; many of them have files like .
I did not think carefully enough that such exotic arrows give problems. It is a good idea to take an image.
May be it is the same problem of visibility with the category names? Characters e.g. 2E80-2EF3 are for many fonts invisible, I had to install CODE2000 to work with them. Your bot won't matter if your monitor does not show you the characters, but if you have any troubles I can define the few cats by hand. Please just tell me whether you can perform the first part of request 3 as it appears now. -- sarang사랑 11:19, 9 February 2010 (UTC)