User talk:Fæ

From Wikimedia Commons, the free media repository
Jump to: navigation, search



Thank you!

Dear Fæ,

I just wanted to say ‘Thank you!’ for your incredible restless work, which saves millions of images, many of them very valuable, for the benefit of all of us and maybe even for the benefit of future generations.

Thank you very much! --Aristeas (talk) 14:58, 23 February 2018 (UTC)

Thanks for the feedback! -- (talk) 09:04, 20 March 2018 (UTC)

Category:Faebot analysed duplicates ready for review

Just a heads-up that the category has stopped repopulating for some reason. Kelly (talk) 19:34, 1 February 2018 (UTC)

@Kelly: Thanks, rerunning. It was more of a test routine rather than established maintenance, so it remains a manual instigation. I'm unsure how useful or interesting others find it. -- (talk) 13:13, 27 February 2018 (UTC)
I was visiting it periodically - I thought it was quite useful. Kelly (talk) 13:16, 27 February 2018 (UTC)
@Kelly: I have turned this into a crontab task for my desktop, rather than running manually. I'm not putting this on labs, due to the very narrow scope it current has. It is currently set to kick off just once a day, then run for 8 hours depending on how long it takes to work through the backlog, which can take a while as it has no memory of the last run. If it appears to halt for 24 hours, or anything else unexpected happens, drop me a note. Thanks -- (talk) 09:12, 8 March 2018 (UTC)
Will do, thank you very much! Kelly (talk) 10:17, 8 March 2018 (UTC)

Files without file pages

Many of the files listed here were uploaded by you: Do you have an explanation why the description is missing? -- 17:53, 23 February 2018 (UTC)

It is because User:Artix Kreiger 2 uploaded a bunch of pages with 2 million+ bytes of text (example), which overloaded the servers. Unfortunately, I had to systematically mark all these images as nld without notifying the uploader, including Fae. Magog the Ogre (talk) (contribs) 04:02, 24 February 2018 (UTC)

Category:Images uploaded by Fæ (metadata needed) -- (talk) 12:24, 25 February 2018 (UTC)

Deleted / restored examples

User:Fæ/Imagehash and LOC images

Fæ, any chance of running the bot against the LOC images recently uploaded? (Bain, Harris/Ewing, etc.) Just at first glance it looks as if we may have quite a few versions/copies of some of the images that could do with being linked together via "other version" and probably some that can be deleted as well. All the best - Kelly (talk) 22:21, 7 March 2018 (UTC)

It's a non-trivial thing to set up and adds hash info hidden in the page source text. Best done once all the current uploads are complete. I'll look at doing it, but in a month or two.
The current uploads should spot earlier upload matches with the same IDs or SHAs, and add them to the gallery. For example File:Keene Fitzpatrick - Princeton LCCN2014694395.tif was uploaded a few minutes ago, and includes a thumbnail link back to the jpeg version uploaded in 2012. -- (talk) 22:31, 7 March 2018 (UTC)
No rush, obviously. I've been doing some categorization on the newly uploaded photos, and it looks like a lot of matches are being spotted during the upload process. A few are not - for instance, File:Anita Stewart, Lallie Charles - Lallie Charles LCCN2014681265.jpg and File:Anitastewart 4.jpg. There are also some circumstances where the same photo has found its way here via a different source, such as the photos in Category:Hirata Tōsuke. Many thanks. Kelly (talk) 22:42, 7 March 2018 (UTC)
I'll examine these as part of looking at the set up. However as the 'sample space' for hashes ought to return no more than one or two 100,000 files to test. At that size the hash generation will take several days. Due to the reality of processing time, which increases as the square of volume of the sample space when trying to pair up files using the hashes, I doubt that whole categories where LOC file might appear can be included.
Even a generalized url search (like would yield too many files (>600,000), so the 'space' will probably have to be limited to finding something like "" plus collection ID like "ggbain" (which at the moment returns 36,000 files). If the keywords are absent from a duplicate, and the duplicate is not identical by SHA1 value, my script will not pair it up as a duplicate.
Let's park for the moment. I may have better solutions to try when I get to it, and it might be worth writing up as another experiment. -- (talk) 09:39, 8 March 2018 (UTC)
Sounds good to me. Kelly (talk) 00:54, 13 March 2018 (UTC)


Hash matching

Apart from drilling into categories, in this case added two weeks after LOC upload, there are no identification codes that could automatically connect the alternates. Even the given names do not match sufficiently for automation. As the scan has defects and the LOC image has writing on the negative border, images hashes may not be identical. This case may not be picked up through an image hash project that was limited to finding equal thumbnail hash values.

LOC code matching

Examining the history of the above example duplicate, the 2009 jpeg is the identical resolution to the full size TIFF uploaded this year. The LOC does not make available a jpeg version of the full size TIFF, but instead supplies two lower resolution versions jpegs. However the 2009 upload clearly shows in the EXIF data that the jpeg was "officially" created by the LOC, so presumably was originally made available by LOC.

LOC uses the LCCN inconsistently for apparent historical reasons. This means that not all batch uploads use LCCNs and instead default back to the collection digital ID like ggbain.01270.

Based on LOC's summary, the LCCN is (almost always) formed of <year, 4 digits> + <serial number, 6 digits with leading zeros>. The LCCN uses the year the central record was made.

At the time of writing this, projects in User:Fæ/LOC use the LCCN to search for existing files unless the MARC record is being used (i.e. no LCCN available), only then does a "pnp" value get used to search for existing matches on Commons. This is created from the MARC record. The pnp is deduced by searching the MARC record for a url matching "loc.pnp.*\d{4}.*", this is probably always in field 856 (access method 4 - http). Searching this way avoids matching to urls for the collection rather than the item. A technical explanation of MARC 856 is here.

However, the pnp value is always created as this is used for both types of upload to display and link in the {{LOC-image}} credit template. The credit template is used this way for historical reasons, not because the LCCN is less good where it exists.

Housekeeping task

Rather than retrospectively changing the batch upload process to do a cross-image search with pnp, it is more convenient to stick to creating a specific housekeeping task to add to or create image page galleries to cross-link both old jpg files and newer uploads. This would not compare the images directly, only find matches by identification codes. The George Grantham Bain Collection is 55% uploaded, but as we have one example missing match, and the collection contains over 20,000 items/jpegs, it makes for a good test category.

The proposed housekeeping task does the following:

  1. Pull a list of all jpegs in a LOC collection category
  2. For each image, extract the pnp value from the image page
  3. Do a general Commons search for pnp value matches, like insource:/ggbain.01270/ insource:LOC
  4. Check whether the image pages cross-link to each other, and add in any additions

Some initial passive results. Returning pnp matches to jpegs in PD-Bain where images are not already cross-linked and the 'seed' image may not be my upload. The first image of each set is the source, and there may be more than one match which need not be in PD-Bain:

Sample highlights:

Now running. Initial changes:

There will be a few exceptions, such as Flanders-F4-1912.jpg where there is no information template or no 'other_versions' included. To automate this would be a pain. However these will be linked to in the galleries of the associated files, in this example "Flanders" monoplane LCCN2014697249.tif. A marginal case is Hattie the elephant, where there are 5 jpegs along with the TIFF, these variations are due to crops. Should any image be matched more than 7 times this will be interpreted as a search problem and skipped.

A known "bug" is where a gallery exists and the linked file is presented using odd characters. These may be included twice in the gallery as searching for matches is made too complex. Example diff and Example from the Hattie the elephant variations with bad characters. As playing about with possible html encoding variations is ridiculously complex, these are being ignored as fringe cases.

Lastly, this runs slowly. I am not using multithreading for this task and it probably does not need it to be effective. I'll add a note about it on the LOC project page. As the Bain collection is still being uploaded, the housekeeping is officially starting with Popular Graphic Arts, Recent Changes list.

✓ Done -- (talk) 14:29, 16 March 2018 (UTC)

Congratulations 🎉, again

I just saw that you had passed the six million edit mark, I think that probably almost half free files on Wikimedia Commons were uploaded by you. Face-wink.svg Face-smile.svg --Donald Trung 『徵國單』 (Talk 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:55, 8 March 2018 (UTC)

File:Medieval body sherd of an unglazed Pennine (or Northern) gritty ware (FindID 564839).jpg

Hello, in the above file there is a descrepency between the CC license and the statement that image is "All rights reserved, Alex Whitlock". Would you be able to resolve this? Richard Nevell (talk) 23:45, 12 March 2018 (UTC)

Hello, just wondering whether you've had a chance to take a look at this. Richard Nevell (talk) 21:49, 18 March 2018 (UTC)

Missed duplicate

Just FYI - here's a set of duplicates that Faebot missed. File:U.S. Navy Chief Aviation Warfare Systems Operator Steve Smith, right, a recruit division commander at Navy Officer Candidate School, calls out cadence to students during a physical fitness session 101023-N-IK959-457.jpg and File:Flickr - Official U.S. Navy Imagery - Navy officer candidates train during fitness session..jpg. Kelly (talk) 00:52, 13 March 2018 (UTC)

The duplicates in the original experiment were discovered by using the (check needed) subcategory. As this created a large backlog of duplicates which have yet to be sorted out, doing the rest is not going to be a priority. -- (talk) 10:09, 13 March 2018 (UTC)

File:The coastal patrol craft USS Firebolt (PC 10), foreground, and the Royal Navy destroyer HMS Dragon (D35) are underway in the Persian Gulf May 21, 2013, during International Mine Countermeasures Exercise (IMCMEX) 130521-O-ZZ999-101.jpg

File:The coastal patrol craft USS Firebolt (PC 10), foreground, and the Royal Navy destroyer HMS Dragon (D35) are underway in the Persian Gulf May 21, 2013, during International Mine Countermeasures Exercise (IMCMEX) 130521-O-ZZ999-101.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:The coastal patrol craft USS Firebolt (PC 10), foreground, and the Royal Navy destroyer HMS Dragon (D35) are underway in the Persian Gulf May 21, 2013, during International Mine Countermeasures Exercise (IMCMEX) 130521-O-ZZ999-101. Kelly (talk) 01:22, 13 March 2018 (UTC)

File:Married Gary McCoy, 1978. (3348642381).jpg

File:Married Gary McCoy, 1978. (3348642381).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Married Gary McCoy, 1978. (3348642381).jpg Cristiano Tomás (talk) 13:43, 13 March 2018 (UTC)

File:Clementine Hunter (13269966095).jpg

File:Clementine Hunter (13269966095).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Clementine Hunter (13269966095).jpg World's Lamest Critic (talk) 17:20, 13 March 2018 (UTC)

File:On and around Bolivias' Salar de Uyuni - the motely crew - Keith, Andres, Tracey,Krystle, Phil & Murray - (24746112291).jpg

File:On and around Bolivias' Salar de Uyuni - the motely crew - Keith, Andres, Tracey,Krystle, Phil & Murray - (24746112291).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:On and around Bolivias' Salar de Uyuni - the motely crew - Keith, Andres, Tracey,Krystle, Phil & Murray - (24746112291).jpg Kulmalukko (talk) 22:13, 13 March 2018 (UTC)

Notification about possible deletion

Bundle DR:
Commons:Deletion requests/Files in Category:Media needing categories as of 7 December 2016


And also:

-- Kulmalukko (talk) 22:36, 13 March 2018 (UTC)


Hello. Help me upload this photo to Wikipedia. Thanks you very much. 01:48, 14 March 2018 (UTC)

May you help. 01:51, 14 March 2018 (UTC)

File:USA - San Diego (1276993931).jpg

File:USA - San Diego (1276993931).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:USA - San Diego (1276993931).jpg Bri (talk) 04:57, 14 March 2018 (UTC)

File:A man riding a horse in a graveyard, surrounded and attacked Wellcome V0025905.jpg

File:A man riding a horse in a graveyard, surrounded and attacked Wellcome V0025905.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:A man riding a horse in a graveyard, surrounded and attacked Wellcome V0025905.jpg Yann (talk) 18:19, 15 March 2018 (UTC)

File:Hippopotamus feeding, Central Park, N.Y (NYPL b11708080-G91F230 044ZB).tiff

File:Hippopotamus feeding, Central Park, N.Y (NYPL b11708080-G91F230 044ZB).tiff (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Hippopotamus feeding, Central Park, N.Y (NYPL b11708080-G91F230 044ZB).tiff Arnaud Palastowicz (talk) 19:20, 15 March 2018 (UTC)

File:Feeding the swan in the park (NYPL b11708080-G91F230 001B).tiff

File:Feeding the swan in the park (NYPL b11708080-G91F230 001B).tiff (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Feeding the swan in the park (NYPL b11708080-G91F230 001B).tiff Arnaud Palastowicz (talk) 21:12, 15 March 2018 (UTC)

File:WAINWRIGHT, U.S. Navy LCCN2014701319.jpg

File:WAINWRIGHT, U.S. Navy LCCN2014701319.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:WAINWRIGHT, U.S. Navy LCCN2014701319.jpg De728631 (talk) 00:21, 16 March 2018 (UTC)


Rachelle in The Early Evening Sunshine (14880637121)

Hi. I'm forwarding this question to you because I appreciate your experience: What do you think, is this "a high-quality photo (of at least semi-professional quality and high resolution)"? (Quote from DR discussion.) Cheers. --E4024 (talk) 08:04, 16 March 2018 (UTC)

The photograph is high resolution and with a sharp focus, sharp enough to show individual pores in her skin. It cannot be considered an amateur shot and there are no issues of privacy considering the deliberate pose for the camera. There is sufficient educational value to be inscope, such as being an example of how "lumberjack" fashion style may be used by women, glamour photography with women subjects, how emphasis on cheekbones using blusher is a popular make-up technique, and as an example of a woman with red hair, pink lipstick and a cleft chin may be considered fashionable.
For these reasons the DR nomination was unnecessary. -- (talk) 13:05, 16 March 2018 (UTC)

Notification about possible deletion

Bundle DR:
Commons:Deletion requests/Files in Category:Lina von Schauroth


Yours sincerely, .     Jim . . . (Jameslwoodward) (talk to me) 19:27, 16 March 2018 (UTC)

File:Goslings blackseal2 (26376289302).jpg

File:Goslings blackseal2 (26376289302).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Goslings blackseal2 (26376289302).jpg --Jonatan Svensson Glad (talk) 22:40, 16 March 2018 (UTC)

Notification about possible deletion

Bundle DR:
Commons:Deletion requests/Files from Flickr sets "【日本交通】中部機場到白川鄉交通方式"
Yasu (talk) 15:04, 17 March 2018 (UTC)

About account

Hello Fæ, how your account is contributing 24 hrs. Is your account is automatic or it is a bot? Or this is due to your Python script?--√Jæ√ 13:08, 18 March 2018 (UTC)

Long term maintenance tasks, such as the GLAM dashboard, are passed to Faebot if they are very stable. Normal editing, batch uploads, corrections or short term tasks I run from my main account. The point of the bot flag is to hide maintenance stuff from being constantly flagged in recent changes and annoying other contributors. In particular it is better for me to run my batch uploads under my main account so I can see problems as they arise and handle them before I start getting complaints. For example I have this morning been correcting my LOC cross-linking script after seeing problems with handling html encoding, and been testing it out on my main account, though I have done some of the changes using Faebot I may choose to do the rest on my main account because it needs more watching than I expected.
The scope of Faebot is on his user page, though the description is not down to each job level as my projects have been so stable for the last couple of years. Some of Faebot's tasks have been running without any problems for five years now. -- (talk) 13:28, 18 March 2018 (UTC)


Hello, could you please focus your project on this user's upload as this user has many useful images.--√Jæ√ 13:50, 19 March 2018 (UTC)

Notification about possible deletion

File:Annals_of_the_South_African_Museum_=_Annale_van_die_Suid-Afrikaanse_Museum_(1990)_(17800194694).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Annals_of_the_South_African_Museum_=_Annale_van_die_Suid-Afrikaanse_Museum_(1990)_(17800194694).jpg IJReid (talk) 19:24, 19 March 2018 (UTC)

@IJReid: The notice you have put here includes no links and so means nothing. Could you fix this please? Thanks -- (talk) 13:27, 20 March 2018 (UTC)
Honestly I'm not too sure, the bot normally auto-fills these. Request is here: Commons:Deletion_requests/File:Annals_of_the_South_African_Museum_=_Annale_van_die_Suid-Afrikaanse_Museum_(1990)_(17800194694).jpg IJReid (talk) 15:02, 20 March 2018 (UTC)

File:Mack Sennett, 1880-1960, three-quarters length, seated in chair, facing slightly right LCCN2005679808.jpg

File:Mack Sennett, 1880-1960, three-quarters length, seated in chair, facing slightly right LCCN2005679808.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Mack Sennett, 1880-1960, three-quarters length, seated in chair, facing slightly right LCCN2005679808.jpg De728631 (talk) 11:19, 20 March 2018 (UTC)

File:Rockstar mirror selfie (15752131566).jpg

File:Rockstar mirror selfie (15752131566).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Rockstar mirror selfie (15752131566).jpg Gbawden (talk) 11:47, 20 March 2018 (UTC)

strange IP edits

Hi Fae, you may have already seen the IP edits to File:Drawings (airplanes), by "Richard", 1941 Wellcome L0017926.jpg and File:Drawing of airplane, by "Richard", 1941 Wellcome L0017924.jpg. What is strange is that in both cases the original source-link goes into nirwana. --Túrelio (talk) 11:49, 20 March 2018 (UTC)

The original source was fine, normal link rot and Wellcome website redesign. The EXIF data looks good as evidence, along with the old blog post by the Wellcome. They could be converted to a DR, but it's an old discussion that should always result in a keep unless there is strong evidence that the original release by the Wellcome was incorrect. -- (talk) 12:08, 20 March 2018 (UTC)
Ok. I've reverted the IP edits. --Túrelio (talk) 13:24, 20 March 2018 (UTC)

good english

Hi Fae, would "People smelling flowers" be a reasonable category-name for images such as File:Brisa Rios.jpg? --Túrelio (talk) 13:48, 20 March 2018 (UTC)

Yes. You could use "people sniffing flowers", but I think your suggestion is better. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:05, 20 March 2018 (UTC)

UK Parliamentary portraits

You'll no doubt recall importing MPs official portraits, last summer. The Lords' pics are now available:

(ignore date in URL; post has been updated today). Please can you do the same again? If so, please note the naming structure used in Category:Official United Kingdom Parliamentary photographs 2017. After Friday, I'll have some free time to add them to Wikipedia articles/ Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:53, 20 March 2018 (UTC)

I may get distracted, but hopefully the last script should be reusable, so it should be done in a day or two. I might even post it on github if I find the time to trim out dead code. -- (talk) 20:08, 20 March 2018 (UTC)

File:Alan Labisch 2016 (Unsplash).jpg

Commons-emblem-issue.svg File:Alan Labisch 2016 (Unsplash).jpg has been listed at Commons:Deletion requests so that the community can discuss whether it should be kept or not. We would appreciate it if you could go to voice your opinion about this at its entry.

If you created this file, please note that the fact that it has been proposed for deletion does not necessarily mean that we do not value your kind contribution. It simply means that one person believes that there is some specific problem with it, such as a copyright issue.
Please remember to respond to and – if appropriate – contradict the arguments supporting deletion. Arguments which focus on the nominator will not affect the result of the nomination. Thank you!

Afrikaans | العربية | Български | বাংলা | Беларуская (тарашкевіца)‎ | Català | Čeština | Dansk | Deutsch | Deutsch (Sie-Form)‎ | Zazaki | Ελληνικά | English | Esperanto | Español | Eesti | فارسی | Suomi | Français | Galego | עברית | Magyar | Bahasa Indonesia | Íslenska | Italiano | 日本語 | 한국어 | Македонски | മലയാളം | Plattdüütsch | Nederlands | Norsk nynorsk | Norsk bokmål | Occitan | Polski | Português | Português do Brasil | Română | Русский | Slovenčina | Slovenščina | Shqip | Српски / srpski | Svenska | Türkçe | українська | Tiếng Việt | 中文 | 中文(简体)‎ | 中文(繁體)‎ | +/−

Kalbbes (talk) 23:34, 20 March 2018 (UTC)