Commons:Bots/Work requests

From Wikimedia Commons, the free media repository
< Commons:Bots(Redirected from Commons:BR)
Jump to: navigation, search

Shortcut: COM:BR· COM:BWR

Bot policy and list · Requests to operate a bot · Requests for work to be done by a bot · Changes to allow localization  · Requests for batch uploads

Filing cabinet icon.svg

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day .

heading= → heading:[edit]

I noticed that I used "heading=" instead of "heading:" (the equal sign instead of the colon) in {{location}} template by mistake many times. This wrong format is non-functioning. As we can see, that's not only my problem. There are hundreds files (1367 pages found) which contain {{location}} or {{location dec}} with faulty format of heading parameter. Could somebody correct it by a bot? (P.S., some other occurences concern the {{Depicted place}} template which doesn't support heading parameter at all yet.) --ŠJů (talk) 00:56, 19 November 2014 (UTC)

Also other atributes of this template (region:, scale:, dim:), missing underscore character between parameters etc. can be checked. --ŠJů (talk) 00:56, 19 November 2014 (UTC)

Halloo! Is here somebody? --ŠJů (talk) 23:42, 4 December 2014 (UTC)

I created Category:Media using Location template with incorrect parameter and will try to clean it up once it fills up a little. --Jarekt (talk) 06:00, 5 December 2014 (UTC)

Adding the Information template to files that don't have it[edit]

Hi again :) As part of the File metadata cleanup drive, I'm working to add the {{Information}} template to the ~700,000 files that don't have it, so that the information can be accessed easily. This is a complex undertaking, but there are small tasks we can take on to make incremental progress.

An easy group of files to start with are those like this one, whose description page basically consists of:

== {{int:filedesc}} ==

< Some description >

== {{int:license-header}} ==

{{Self| <some licence(s) }}

< categories >

In this case, it's relatively easy to add the {{Information}} template:

  • add the information template under == {{int:filedesc}} ==
  • move the existing description to the Information template's Description field
  • add the name of the uploader as the author (since it's their own work)
  • add {{own}} as the source
  • add the date from EXIF data, if available, otherwise leave blank.

This will work for only a subset of the files missing the {{Information}} template, but we have to start somewhere :) (Pinging Multichill, MGA73, Amir and Keegan per previous discussions.) Guillaume (WMF) (talk) 00:50, 2 December 2014 (UTC)

I will fix this. Amir (talk) 12:05, 2 December 2014 (UTC)
I have been adding Category:Media missing infobox template and thinking about this issue. I was also trying to discuss it at VP, see here. I think we should use divide and conquer approach I would propose the following:
  1. Mark the files by adding them to Category:Media missing infobox template what will allow everybody to see the files.
  2. Some files likely have information template but have some syntax errors, those I try to place in Category:Pages using Information template with parsing errors
  3. I would propose to first give the original uploaders a chance to fix the files. We can do that by writing a standard message, which without any threat of deletion, ask for help whit bringing those files up to current standards. We should have one message per uploader with a list of all the files that need infoboxes. Many of the images without infobox templates are from the early days of Commons and many of those people might not be around anymore. We should also advise them on the use of VisualFileChange gadget or requesting specific tasks to be done by bots at Commons:Bots/Work requests.
  4. Many files have all the info just not in the right form, for example File:Orchis militaris flowers.jpg or File:St Germain des Prés fenêtre.jpg. We might be able to recognize some patterns used and fill {{information}} based on that.
  5. Some images were moved from wikipedia, like File:St michaelis.jpg and have no information about the photographer. we would need to look the information up on EN-WP to find the name of the original uploader.
  6. Some images imply "own" work by the uploader, like File:MaisonHonfleur1.jpg or File:Pinus pinaster female.jpg, but do not actually say it. If the files have EXIF data and templates like {{PD-self}} or {{GFDL}}, {{self}}, I think it might be OK to fill the {{Information}} with {{own}} and the name of the first uploader and the EXIF date.
  7. Some files have some home-brewed infobox templates that are not maintained or recognized
  8. Many {{PD-old}} files should use {{Artwork}} instead of {{Information}}, for example File:Leonardo da Vinci Grotesque Heads.jpg.
  9. I do not know what to do with, files like File:Ruins at Delfi.JPG. User should have been advised that he needs to send the permission to OTRS, but 10 years ago when he uploded the image OTRS mostly dealt with handling emails from the public not permissions.
Once we deal with a lot of "easy" cases we can asses what is left. --Jarekt (talk) 16:46, 2 December 2014 (UTC)
Thanks, Jarekt! That's a great plan. Should we discuss the details elsewhere or is here ok? Guillaume (WMF) (talk) 18:35, 2 December 2014 (UTC)
I would just keep the discussion here. I was trying to have this discussion on VP and Commons talk:Structured data, but nobody wanted to talk about it, so this place seems better. By the way Category:Items with OTRS permission missing infobox template seems like are distinctive enough to warrant a separate category. --Jarekt (talk) 18:58, 2 December 2014 (UTC)
My approach is to fix easy cases and evolve the script as we handle more complex cases. Amir (talk) 19:49, 2 December 2014 (UTC)
Amir, I am slowly working on step #1 adding Category:Media missing infobox template and more specific subdirectories, so far I am ~5% done. You can start with those files or have your own way of generating the list of files with no infoboxes. It should not be hard as I added {{Infobox template tag}} to all infobox templates (other than {{Information}}) so any file that do not have {{Infobox template tag}} or {{Information}} is likely not to have an infobox. So maybe you want to tackle cases where author, source and possibly date and the description are present, and unambiguous to a human reader (case #4), than you can develop regexp rules to detect them and place them in the correct fields. Some of those rules can be "borrowed" from toollabs:add-information. But the bot should skip unusual cases. Many of the uploads are by the same users which might follow the same pattern and we could process few more prolific users with a custom set of rules. --Jarekt (talk) 20:37, 2 December 2014 (UTC)
ُThank you for your hints, I'll use them and probably work on case #4 Amir (talk) 21:01, 2 December 2014 (UTC)

It would also be helpful to have toollabs:add-information fixed. Currently, it occasionally destroys section headers and other parts of the code. --Leyo 17:14, 2 December 2014 (UTC)

Agreed; I'll reach out to Magnus and follow up here. Guillaume (WMF) (talk) 18:35, 2 December 2014 (UTC)

I finished the script that fixes cases that they consists only language templates (example 1, example 2) Is it okay to start with them? Amir (talk) 09:01, 6 December 2014 (UTC)

That is good, however are you going to be able skip cases which are clearly not "own work", like File:FSO ok 1974r.jpg. Also Files with only language templates might have date, author, source which are not the same. Do you attempt to recognize those? --Jarekt (talk) 09:38, 6 December 2014 (UTC)
It skips when the template:Self is not used and if the language template consists several lines (instead of one). Is that enough? Amir (talk) 11:06, 6 December 2014 (UTC)
I think that limiting it to files using {{Self}} is enough. Could you also remove Category:Media missing infobox template, in case the file has it? (you might be doing it already). Thanks --Jarekt (talk) 17:11, 6 December 2014 (UTC)
I don't remove self template. Should we remove it? Amir (talk) 21:23, 6 December 2014 (UTC)
I am sorry, I forgot : and Category:Media missing infobox template did not show up. I meant to remove that category. --Jarekt (talk) 03:30, 7 December 2014 (UTC)
Yes, It does remove them Amir (talk) 08:46, 7 December 2014 (UTC)

Next step: One line long descriptions: Commons:Bots/Requests/Dexbot_5 Amir (talk) 02:18, 25 December 2014 (UTC)

Btw I found that there might be some uploads with "self"-templates which are not "self" by the uploader because they were transferred from other wiki's. See this one for an example. Some big uploaders (bots and users) have been busy with file transferring in the early days and should at least be exempt when using this method to add information templates. Mvg, Basvb (talk) 12:26, 25 December 2014 (UTC)

An example pattern[edit]

@Ladsgroup:@Guillaume (WMF):Last month under my "normal" account I went through and cleaned up a couple hundred file pages by hand looking for such patterns. Here's an easy(ish) test case for a bot to take on:

RHaworth has/had a bunch of old (2005/6ish) uploads that need formatting. They're pretty easy to do by hand, but even so there's still 61 files left that need completed; I did the other half by hand. The list is on this labs page. I can copy the file names over if need be. Keegan (WMF) (talk) 21:16, 11 December 2014 (UTC)

@Keegan (WMF): on File:All_Saints,_Beeston_Regis.jpg why did you put User:RHaworth as the author, when the text was clear that it is actually User:Stavros1 (Mark Hobbs)? --99of9 (talk) 23:22, 11 December 2014 (UTC)
@99of9: because I made a mistake there. I've fixed it, thanks for pointing it out :) Keegan (talk) 03:54, 12 December 2014 (UTC)

Dutch wiktionary pattern[edit]

There are 130.000 pronounciation uploads from wiktionary on commons, a few thousand (my estimate would be around 7000-15000) don't have information templates. Most of these are uploaded with the same pattern. There are uploads by different uploaders with different patterns. In this edit I change the filedescription of a file by GerardM (which have similar patterns), the description was created by me (but could be generated based on the title. The words "eigen opname" (meaning: own recording) are added in slightly different formats, besides that there is not really a lot of info on the images. This one might be an easy one to add templates to and also a pretty big one. Mvg, Basvb (talk) 00:13, 14 December 2014 (UTC)

In smaller numbers this holds for other languages as well (en, pt I've seen). Mvg, Basvb (talk) 12:04, 14 December 2014 (UTC)
I will fix this for Dutch pronunciations by the weekend Amir (talk) 02:23, 18 December 2014 (UTC)

9154 files of Dutch pronunciations didn't have Information template. Now 8588 more files have it (so 566 still needs to be fixed. I'll do that too) Amir (talk) 05:37, 19 December 2014 (UTC)

Nice work, with those numbers we can work very well! Mvg, Basvb (talk) 11:28, 19 December 2014 (UTC)


A list of books (with more than 100 files) which have no infobox template and could probably use some automated adding of the book-template. Most books have a few hundred pages and thus we are looking at a few hundred files per listed book. Basvb (talk) 00:17, 14 December 2014 (UTC)

I can finish the Book categories. I have a system going that adds book templates with unique page numbers so you can page through the files. The only slow down is that I am creating book templates and often creator templates as I go. --Jarekt (talk) 20:28, 19 December 2014 (UTC)
✓ Done That was a good find and since all the files already use Template:LA2-NSRW, it was also easy to fix. --Jarekt (talk) 05:02, 18 December 2014 (UTC)

I can do those, since I have some script for adding page numbers so one can page through the book, but I would appreciate help with the book templates, like {{L'Odyssée}} or {{Nietzsche's Werke, III}}, since they are the most time consuming especially since I do not speak any of the languages. --Jarekt (talk) 05:04, 14 December 2014 (UTC)

Thanks, I can do the templates in a few days. Mvg, Basvb (talk) 12:01, 14 December 2014 (UTC)
Btw, is there any fix for the fact that all (or a lot) books with booktemplates end up in Category:Files with no machine-readable source and Category:Files with no machine-readable author? Basvb (talk) 12:03, 14 December 2014 (UTC)
Template:L'Ile des Pingouins is done (I'll add them to the relevant lines of the books from here on). Basvb (talk) 14:05, 14 December 2014 (UTC)
Smiley.svg Thank you --Jarekt (talk) 18:13, 14 December 2014 (UTC)
Thank you! I've also left a message to the French Wikisource community to see if they can help to create the book templates. Guillaume (WMF) (talk) 18:21, 15 December 2014 (UTC)
@Guillaume (WMF): Thank you, althought my French is limited exactly to the understanding of book covers. I've a question, is it possible to generate from the data about files without infobox which users have uploaded a lot of files (let's take over 100) or a lot of files in one category? This would help a lot in finding books like these and other patterns which can be fixed easily. Mvg, Basvb (talk) 21:13, 15 December 2014 (UTC)
Basvb: Sorry for the delay on this; I'm still new to SQL queries, so it took me a little time. The list you asked for is now available and I'll set it up to be refreshed every day. You might want to download the file to your computer to avoid encoding errors if you open it in your browser. I see familiar names in the list, like MarcBot (used for many of the books discussed above) and G.dallorto (that you mentioned below), so I'm reasonably confident that it's what you're after. Let me know if I can do anything else! Guillaume (WMF) (talk) 01:22, 17 December 2014 (UTC)
Guillaume (WMF): Thank you very much, that makes it much easier to search for the big fish, which will save the people who work on this file by file a lot of work. If we for example fix all files of uploaders with over 1000 uploads (without infobox) than we have the first 200k done. About the anything else, I indeed had another idea, depending on how hard it is a good way to find similar uploads is when the lists are sortable on uploaddate, but I can be busy with the this uploader list for a while. Mvg, Basvb (talk) 09:08, 17 December 2014 (UTC)
Basvb: I've made another query. You can now download a list of the files missing machine-readable metadata, grouped by user, and with the timestamp. Warning: this is a ~40 MB text file so some browsers may have issues with it. I suggest you download it to your computer and open it with a spreadsheet application, so you can reorder the content more easily. For example, I imagine that you could select all the files from a given user, and reorder them by upload date to see if there are patterns. The file isn't being updated for now, but I can set it up if you think it would be useful. Guillaume (WMF) (talk) 19:08, 17 December 2014 (UTC)
Guillaume (WMF): Thank you, now I can get to the regexfixing. Update of the file is not really needed (until a big chunk is done). Mvg, Basvb (talk) 20:31, 17 December 2014 (UTC)

When I saw Guillaume's message on WS, I came here to see what I could do to help. I haven't check all these books, but the 1st one I looked (Category:Lettres de mon moulin), there is already a DjVu file from this same book edition: s:fr:Fichier:Daudet - Lettres de mon moulin.djvu... what is the usual procedure when we come around this kind of thing on Commons? is it considered as a duplicate? Thanks. --Ernest-Mtl (talk) 02:23, 16 December 2014 (UTC)

The DJVU Lettres de mon moulin file is misplaced in Wikisource, and has to be uploaded in Commons. There is a number of JPG book pages, such as Gustave Flaubert Category:Bouvard et Pécuchet, Category:L'Éducation sentimentale, Category:Madame Bovary, duplicates of DJVU; now Wikisource uses the DJVU and the JPG are no more useful. --Wuyouyuan (talk) 13:47, 16 December 2014 (UTC)
I do not think we have a policy on that but I am inclined to let the old files stay and the book template we use for them can be reused for the DjVu files, as I did with files in Category:Encyclopédie – Planches V1–9 (pages assemblées, DJVU). --Jarekt (talk) 03:19, 16 December 2014 (UTC)
I just realized you are talking about the case of some files on Commons which are the same as files on french wikisource. That is quite puzzling why is french wikisource hosting local files? Either way we are not going to delete our copy just because one of the projects has a local copy, and we still would try to add metadata to our copy. --Jarekt (talk) 03:27, 16 December 2014 (UTC)
The reason why I was asking the question is that we are actually moving all those files to commons. Some files from the early days of the projects were simply uploaded to WS. Furthermore, the quality of the djvu scans are a lot better than the jpg in the case of Lettres de mon moulin and I would, personaly, find it a waste of space to keep individual jpgs of a book that can be accessed in djvu, especially that the djvu format here on commons allows people to save individual pages into jpgs on their computer if they can't open a djvu file... That's the reason why I was asking what was the procedure here, before doing something that would not have been considered correct. If someone is to waste time on these 200 some jpgs files, it won't be me as I consider these files useless duplicates. --Ernest-Mtl (talk) 15:07, 16 December 2014 (UTC)
In such a case once the DjVu file is copied you should nominate the jpegs for deletion as poorer quality duplicates. --Jarekt (talk) 15:16, 16 December 2014 (UTC)
Yes, except when the JPEG are not easily available. That was (is?) the case for the Encyclopédie files. Regards, Yann (talk) 17:46, 16 December 2014 (UTC)

One thing that I noticed is that most of the files here are uploads by User:MarcBot for french wikisource and all (as far as I noticed) were unused and replaced by DjVu files. The book images are often incomplete concentrating on the pages with "text" and skipping the title pages, tables of content, etc. We usually do not remove files which are not identical duplicates but in this case thase are truly unusable files, since better versions uploaded latter exist. --Jarekt (talk) 04:51, 22 December 2014 (UTC)

Yes. The fact that most files are from MarcBot is because I looked at all the uploads from this bot. I don't know what's the best plan for the images. Mvg, Basvb (talk) 13:21, 22 December 2014 (UTC)

Over 500 images by G.dallorto[edit]

There are over 500 images by G.dallorto without information templates which have some basic pattern, dates in exif-data and mainly have a self-license. Seems like a pattern which could be matched. Example edit: here (media missing information template cat should also be removed). Basvb (talk) 00:30, 14 December 2014 (UTC)

My bot will fix this pattern and similar. I'm waiting for approval. Amir (talk) 18:58, 14 December 2014 (UTC)
✓ Done Amir (talk) 12:29, 16 December 2014 (UTC)
A lot of files in Category:Società Umanitaria (Milan) aren't done yet. Mvg, Basvb (talk) 19:56, 16 December 2014 (UTC)
their pattern was a little bit different, my bot fixes them too now and it finishes them soon Amir (talk) 19:48, 17 December 2014 (UTC)
I see now that this user has images with a lot of different patterns (just heavily active) thus processing all of those by both isn't really suitable, it'll just be part of other botbatches if it fits. Thus lets close this request. Mvg, Basvb (talk) 22:32, 17 December 2014 (UTC)

over 500 images by Lalupa[edit]

over 500 architectural shots with one sentence text description.[1] some with move table.[2] Slowking4Farmbrough's revenge 23:15, 23 December 2014 (UTC)

Category work needed[edit]

Category:Images from the Veikkos Archive – needing category checks has a backlog of some 13,000 files. I have come across a number of them that have a redundant category. Is a bot able to do the following:

  • If Category:Sealing stamps of LOCATION and Category:LOCATION both exist then remove Category:LOCATION only if Category:Sealing stamps of LOCATION exists (LOCATION is a variable)
  • If the preceding is true then remove Category:Images from the Veikkos Archive – needing category checks

This would save a lot of time for us human editors. Regards. Alan Liefting (talk) 19:00, 20 December 2014 (UTC)

Fixing double parameter errors[edit]

Hi all,

The maintenance category Category:Pages using duplicate arguments in template calls contains a lot of images (x out of 45000) which are quite easy to be fixed by both when we look at two simple patterns. The first pattern is two parameters of which one is empty and the other is filled. The empty one can just be removed. Examples: A1, A2. Another issue is two parameters with exactly the same content, one can just be removed. Examples:B1, A3 and B2. Is it an idea to fix those by bot? I expect something like 10.000-30.000 files will be affected. In the examples given only the information named parameters were affected, there are however also other templates which could cause the issue, a lot of times without named parameters.

Mvg, Basvb (talk) 16:10, 26 December 2014 (UTC)

We should probably finalize working on templates in that category, since a lot of files can be affected by one or two templates. Otherwise I think this is a great idea. --Jarekt (talk) 23:06, 26 December 2014 (UTC)
Yes that's a thing I found out after this question. Mvg, Basvb (talk) 23:35, 26 December 2014 (UTC)
I fixed a lot of the templates, all of the creators and institutions. On Category talk:Pages using duplicate arguments in template calls the most important templates which cause a lot of errors (used a lot) are described, these are mainly pretty complicated templates. Mvg, Basvb (talk) 15:38, 27 December 2014 (UTC)