Commons:Village pump

From Wikimedia Commons, the free media repository
(Redirected from Village Pump)
Jump to: navigation, search

Shortcut: COM:VP

Community portal
Help desk Village pump
Administrators' noticeboard
vandalismuser problemsblocks and protections
↓ Skip to table of contents ↓       ↓ Skip to discussions ↓       ↓ Skip to the last discussion ↓
Welcome to the Village pump

This page is used for discussions of the operations, technical issues, and policies of Wikimedia Commons. Recent sections with no replies for 7 days and sections tagged with {{section resolved|1=--~~~~}} may be archived; for old discussions, see the archives.

Please note

  1. If you want to ask why unfree/non-commercial material is not allowed at Wikimedia Commons or if you want to suggest that allowing it would be a good thing, please do not comment here. It is probably pointless. One of Wikimedia Commons’ core principles is: "Only free content is allowed." This is a basic rule of the place, as inherent as the NPOV requirement on all Wikipedias.
  2. Have you read our FAQ?
  3. For changing the name of a file, see Commons:File renaming.
  4. Any answers you receive here are not legal advice and the responder cannot be held liable for them. If you have legal questions, we can try to help but our answers cannot replace those of a qualified professional (i.e. a lawyer).
  5. Your question will be answered here; please check back regularly. Please do not leave your email address or other contact information, as this page is widely visible across the internet and you are liable to receive spam.

Purposes which do not meet the scope of this page

Search archives


People of Ngadisan (Java, Indonesia) are filling their cans at the village pump. The old well is defunct and replaced by a water tap. [add]
Centralized discussion
See also: Village pump/Proposals • Archive

Template: View • Discuss  • Edit • Watch

April 23[edit]

Wiki Loves Food[edit]

Curd Rice

Hello! After the successful pilot program by Wikimedia India in 2015, Wiki Loves Food (WLF) is happening again in 2018 and this year, we are going International. To make this event a grant success, your direction is key. Please sign up as a volunteer or sign up on behalf of your affiliate here.--Abhinav619 (talk) 08:46, 27 A

New QIC user script[edit]

This user script reduce considerably the complexity of the revision process and prevent any edit conflict and yes Now reviewing is something much more fun !!!. How use it? follow the next steps:

  1. Edit your common.js
  2. Add a line with importScript("User:The Photographer/QICvote.js");
  3. Ready!! now you will see a combobox bellow for each thumb image on QIC
  4. To vote simply select your votes/reviews/comments using the combobox
    Steap 1.png
  5. Add a review message
    Steap 2.png
  6. When you finish all your multiple reviews, now you can click the buttom "Confirm reviews"
    Steap 3.png

It's posible convert it in a Gadget?. Thanks --The Photographer 00:50, 10 May 2018 (UTC)

I’ve put the <code></code> to the importscript so people knows what exactly to copypaste. Leaving gadget-fy to other tech-savvy admins. — regards, Revi 14:22, 10 May 2018 (UTC)
Great, thanks --The Photographer 04:38, 12 May 2018 (UTC)
Seems to work quite well – that might indeed help motivating me to do some reviews there every one in a while. Thanks! --El Grafo (talk) 08:03, 14 May 2018 (UTC)
✓  Done Gadgetification --Zhuyifei1999 (talk) 00:42, 18 May 2018 (UTC)


User:CommonsDelinker doesn't make file replacements since May, 5. Why? Can somebody give some clue? --2A02:2168:1237:FF00:5940:3173:A590:1563 06:39, 10 May 2018 (UTC)

It has not processed User:CommonsDelinker/commands since this edit 09:21, 5 May 2018‎ (UTC). I reported that here.   — Jeff G. ツ please ping or talk to me 12:39, 10 May 2018 (UTC)
I think that Steinsplitter should be pinged on this one, but it seems that it has started to work again. --Ruthven (msg) 17:39, 12 May 2018 (UTC)
Steinsplitter, COM:CDC is not updating. — regards, Revi 17:54, 16 May 2018 (UTC)
Done. --Steinsplitter (talk) 18:15, 16 May 2018 (UTC)

May 11[edit]


I want the unofficial recreation of Asia-Pacific Broadcasting Union's logo to be replaced with this legit, hi-res image file. JSH-alive (talk) 10:42, 13 May 2018 (UTC)

@JSH-alive: That's not directly possible, since the replacement is in JPEG format and the existing file is in SVG, so they have to have different names. In any case, they're substantially different colours. I think you should (copyright permitting) upload the official version under a different name and then edit any pages that you want to use the new file. --bjh21 (talk) 18:02, 14 May 2018 (UTC)
@Bjh21: Okay. I uploaded the file at the English Wikipedia (Wikipedia:en:File:Logo of the Asia-Pacific Broadcasting Union.jpg). How do I make a formal request to transfer the file to the Commons? JSH-alive (talk) 16:19, 17 May 2018 (UTC)

Category 'on rail tracks'[edit]

Gleidingen electricity installation.jpg
This electrical unit in a electric substation can only be (re)moved on rails. Can this stil be classified as rail transport? It is on rail but the 'transport' element is very theoretical. But there are cranes moving on rails and other similar industrial objects moving on rails. I am not an expert on electrical technology. Can some classify this type of unit?Smiley.toerist (talk) 11:42, 13 May 2018 (UTC)
@Smiley.toerist: I would certainly see this as an example of rail transport, even if it's not likely to be so transported often. In a similar way, shipping containers are often used for stationary storage (or buildings), but I think categorising them under transport is appropriate. Incidentally, it's a single-phase transformer, used to step down an extra-high voltage supply (110,000 volts according to OpenStreetMap) to the 15,000 volts needed by the railway. Of course, the fact that it's part of a railway power supply is another reason to file it under rail transport. --bjh21 (talk) 18:08, 14 May 2018 (UTC)

Trafostation Alter Hellweg IMGP4722.jpg
This one has even more railinfra.Smiley.toerist (talk) 23:35, 15 May 2018 (UTC)
I created a new category Moveable objects on rails.Smiley.toerist (talk) 11:11, 16 May 2018 (UTC)

License mysteries[edit]

Belgium 1893 PostCard.jpg

I started a license check on old postcards and I found this one. Louis-Eugène Mouchon is mentioned as deceased in 1915. Very well, but of what was he the creator of? The poststamp? The handwritten text? (writing an adres down is not creative work). Or does it pertain to the front of the postcard? (not relevant in this case)Smiley.toerist (talk) 23:22, 13 May 2018 (UTC)

Did you try reading the Wikipedia entry on Louis-Eugène Mouchon? Perhaps it will give you a hint. World's Lamest Critic (talk) 02:41, 14 May 2018 (UTC)
Thanks, I added the missing links and categories. Template 'Century decades navbox' does not seem to work for centuries before 2000. (see Category:Liège-Guillemins train station in the 1890s)Smiley.toerist (talk) 10:27, 14 May 2018 (UTC)

May 14[edit]

Old postcards check[edit]

I regularly come across pictures of old postcards uploaded as own work. I regularise these cases as far as I can with a more correct license. There must be a lot more of these cases. Is it posible to run a check script selecting the files with own license and having postcard categories or other identification as being a postcard. Most cases can be relicensed correctly. The most important criterium is that the postcards are old enough and be anonymous. Often the source is filled in as 'personal collection' or other similar terms. I add 'postcard' to the source item, if posible with the postcard publishing company name. 'Personal collection' is irrelevant for licensing purposes, but if the uploader wants to mention this, we should respect this and keep the mention.Smiley.toerist (talk) 10:56, 14 May 2018 (UTC)

Category first[edit]

I have some technical perception - when I'm looking for "Abel Tasman National Park" for example, if is it possible for the category to be displayed at the beginning of searches? Currently this is a bit deprobate - I enter a phrase "Abel Tasman National Park" and I see only the photos, but not category... Tournasol7 (talk) 17:24, 14 May 2018 (UTC)

@Tournasol7: You can use the Advanced Search to just search for categories or prefix your search with "category:" without quotes.   — Jeff G. ツ please ping or talk to me 17:57, 14 May 2018 (UTC)

Tech News: 2018-20[edit]

22:22, 14 May 2018 (UTC)

interaction tool[edit] is designed to inform users when two or more contributors have made edits to the same file, article, whatever, sorting on material where mere minutes passed between the different contributors edits. It defaults to "enwiki", but I assumed it would work on the commons database, as well. However, it barfed when I replace "enwiki" with "commons". I didn't see any documentation. Google didn't help me find any either.

So, does this tool work on the commons database? If so, how does a user tell it to use the commons database instead of enwiki?

Thanks! Geo Swan (talk) 23:10, 14 May 2018 (UTC)

The name to use for Commons is "commonswiki". --ghouston (talk) 23:30, 14 May 2018 (UTC)
  • Thanks! Geo Swan (talk) 13:43, 15 May 2018 (UTC)
  • Hmmm. Doesn't barf. Doesn't generate any actual output either. Geo Swan (talk) 13:46, 15 May 2018 (UTC)
Perhaps this will help? World's Lamest Critic (talk) 14:34, 15 May 2018 (UTC)

May 15[edit]

Where to place yard category for ships[edit]

We have categories for ships built in a specific ship yard, i.e. Ships built at Bergen Mekaniske Verksted, Bergen. There seem to be no consistance about includinging the IMO categories in these categories, or the subcategory with the ship name(s). One ship can have pictures stored in 5 different subcategories as it has changed name, but the IMO category will be unique and never change as the IMO number follows the hull. We should agree about a standard here and then try to get a bot to implement this standard for all existing categories. --Cavernia (talk) 20:20, 15 May 2018 (UTC)

I partly agree. For the categories for ships built in a specific ship yard, and possibly some other things (?), it makes logically more sense to include the IMO categories in them, than the categories for the names of the ships, for the reason you mention. I am still not absolutely keen on it, though, because even though I know how the IMO number works, it kind of feels more "abstract" (for lack of a better word) to me to find the number IMO 9377016 than the name Fugro Saltire (ship, 2008) in Ships built at Bergen Mekaniske Verksted, Bergen. Don't know if that makes sense to anyone else than me... It is true that we lack consistance about this, and I probably won't protest too much if most others prefer to categorize the IMO numbers rather than the ship names in those cases. Blue Elf (talk) 21:14, 15 May 2018 (UTC)
I would also opt for placing the IMO numbers into a yard category for the reasons presented by Cavernia. The IMO categories are already used as a container for the various name categories of a ship, so it wouldn't make sense to place each name category into the same category. De728631 (talk) 21:02, 16 May 2018 (UTC)
Remember that the categories will always then be a mix of numbers and names as IMO numbers have only recently been adopted and are also not required for all vessels. Rmhermen (talk) 04:27, 18 May 2018 (UTC)
I understand the arguments from both sides, and I'm not sure what is the best solution, but it's obvious that the current categori structure is a confusing mix. --Cavernia (talk) 09:48, 19 May 2018 (UTC)

May 17[edit]

Retrieve and manipulate wikitext DOM[edit]

Hi folks, is there a tool with whom i can retrieve or generate a structured representation of the wikitext, something like DOM for HTML? So that i can easily access the used templates, their parameters, categories etc. In best case i would also be able to manipulate that wikitext DOM and write it back as plain wikitext. With such an abstraction the work of bots would be much easier and safer. Thanks in advance for any help in this direction, --Arnd (talk) 12:39, 17 May 2018 (UTC)

See mw:Parsoid. Ruslik (talk) 20:21, 17 May 2018 (UTC)

Mass delete[edit]

I need to initiate the (speedy) deletion of a system of some hundred unused templates; whom should I ask how to prepare the request? Will it be a bot task? -- sarang사랑 16:27, 17 May 2018 (UTC)

  • Could you be more specific about what you want removed? This is probably a perfectly good place to ask. - Jmabel ! talk 19:01, 17 May 2018 (UTC)

May 18[edit]

Expiration of upload related rights[edit]

I am considering writing a proposal for setting the expiration criteria for the following groups:

If anyone knows of a previous agreement for an expiration criteria, a link would be appreciated.

For these groups, as they are connected to the user's uploads or experience with uploads, it seems to make sense if the user needed to re-apply for the group if they have done no uploads for an extended period. For example a user that has made no uploads to Wikimedia Commons for six months would seem to have no realistic need to keep their account in the Extended uploader's group. As getting access to the group again can happen very quickly based on a reasonable request and the user remaining in good standing, having the group predictably expire would cause nobody any special inconvenience and reduce the risk or perceived risks associated with leaving these special rights on lots of unused accounts.

For GWT there is more information at Commons:Bureaucrats' noticeboard#Expiration_of_GWT_group_memberships.

Thanks -- (talk) 08:47, 18 May 2018 (UTC)

  • What exactly is the downside of then continuing to have the privilege? - Jmabel ! talk 15:48, 18 May 2018 (UTC)
@Jmabel: account credentials leak, account gets abused. - Alexis Jazz ping plz 16:54, 18 May 2018 (UTC)
Same reasons that Commons:Administrators/De-adminship#Activity exists. -- (talk) 16:59, 18 May 2018 (UTC)

Rate limit is 90 edits per minute now, also for tools and gadgets[edit]

As I guess not everyone follows the proposals page, you may want to take a look at this as this change wasn't communicated in any way: Commons:Village pump/Proposals#Rate limit is at 90 edits per minute. Don't comment here, comment over there instead. - Alexis Jazz ping plz 11:37, 18 May 2018 (UTC)

What can the Wikidata community do to make it easier for Wikimedia contributors to understand Wikidata?[edit]

Noun Project author icon 1642368 cc.svg

Dear all

Over the past year or so I've been working quite a lot on Wikidata documentation and have been thinking more about the needs of different kinds of user. I feel that currently Wikidata can be difficult to understand (what it does, how to contribute, what issues there are and what is being done to address them etc) even for experienced Wikimedia project contributors. To help address this I've started an RFC to try and collate this information together. It would be really helpful if you could share your thoughts, especially if you find Wikidata hard to understand or confusing, you can just share your thoughts on the talk page and we will synthesize them into the main document.

Requests for comment/Improving Wikidata documentation for different types of user

Thanks very much

John Cummings (talk) 12:54, 18 May 2018 (UTC)

@John Cummings: "What can the Wikidata community do to make it easier for Wikimedia contributors to understand Wikidata?"
Explain COM:Structured Data to us. (like we're five) - Alexis Jazz ping plz 13:33, 18 May 2018 (UTC)
Thanks @Alexis Jazz:, this is very helpful, can please leave this on the talk page of the RFC? John Cummings (talk) 13:38, 18 May 2018 (UTC)
Will do. (also: fixed your link) - Alexis Jazz ping plz 13:50, 18 May 2018 (UTC)

Translating {{Created with MetaPost}}[edit]

Hello everyone !

I recently tried to translate {{Created with MetaPost}} in french. I created {{Created with MetaPost/fr}} but I don't know how to translate "was created with". Can anyone help me, please ?

Cordially. --Niridya (talk) 16:25, 18 May 2018 (UTC)

May 19[edit]

Michigan Clark 55B?[edit]

I've two pictures of a Michigan Clark 55B loader (File:Villarejo de Fuentes 26.jpg and File:Villarejo de Fuentes 27.jpg). I'd like to open a category for the model but I don't know if the maker's name is Michigan and the model Clark 55B, or the maker is Michigan-Clark and the model is 55B, or Michigan-Clark 55B is the model name and the manufacturer is called something else. If any of you could take me out of my doubts it would be great. B25es (talk) 15:25, 19 May 2018 (UTC)

I've put them into Category:Clark vehicles for the time being but I don't know either where the Michigan brand comes into play. Google yields multiple results of Clark Michigan with or without dashes. De728631 (talk) 15:32, 19 May 2018 (UTC)
Viele Danke & Muchas gracias! B25es (talk) 15:50, 19 May 2018 (UTC)

Chatham House category merge[edit]

Is there a quick and easy way to merge Category:Files from Chatham House Flickr stream and Category:Photographs by Chatham House with verification that the Source is set to Flickr for files in Photographs by Chatham House? // sikander { talk } 20:24, 19 May 2018 (UTC)

Are the latter also from Flickr? You could technically use cat-a-lot to move all files from one category into another, but I think that if you redirect one page to another that a bot will automatically do it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 06:07, 20 May 2018 (UTC)

May 20[edit]

City archive of Kiel[edit]

To this theme I question in the German COM:FORUM#Stadtarchiv Kiel, but nobody answerd me. So I question here: Should we import them? Habitator terrae (talk) 15:08, 20 May 2018 (UTC)

Could I write them, that in the future they could upload their pictures directly to Commons? Habitator terrae (talk) 16:51, 20 May 2018 (UTC)

I request for a batch upload: COM:BATCH#Kieler Stadtarchiv Habitator terrae (talk) 17:23, 20 May 2018 (UTC)

Crop tool down?[edit]

It’s just me or is giving everybody 502 Bad Gateway error? -- Tuválkin 15:52, 20 May 2018 (UTC)

I have just tried to reach it several times, unsuccessfully. :-( --GRuban (talk) 16:01, 20 May 2018 (UTC)
@Tuvalkin, GRuban: Restarted.   — Jeff G. ツ please ping or talk to me 00:40, 21 May 2018 (UTC)

Wrong MIME type in audio files[edit]

A fellow contributor has recorded two MP3 audio files. When we try to upload the to Commons, they generate "File extension ".mp3" does not match the detected MIME type of the file (video/mp4)." error messages. I can open them in Audacity and VLC.

What causes the problem, and how can it be a) fixed (without re-saving from Audacity) and b) avoided in future? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:58, 20 May 2018 (UTC)

May 21[edit]

Cetegorization lost[edit]

With this deletion, a few hundred images lost its nexus linking to the subtree Category:Black and white photographs. How is this a good idea? (@DarwIn: ping!) -- Tuválkin 01:20, 21 May 2018 (UTC)

@Tuvalkin: Hey, thanks for bringing this into the VP. I seem to recall from older threads here that only true "Black and white photographs" (intended to be black and white) should go under that tree, otherwise most of the 20th-century photographs (at least until the early 1970s) will fall there, which do not seem to contribute to the usability of those categories. Is that tree supposed to include *all* BW photographs? Even if it is, I can't see any use in including them into "BW photos of Portugal", then "BW photos of Portugal", as it just clutters those cats making them useless to someone trying to find photos intended to be BW (using that technique).-- Darwin Ahoy! 01:35, 21 May 2018 (UTC)
The lead of :Category:Black and white photographs says "This category and the subcategories are applied to all black-and-white photographs. This allows to identify easily B&W photos (as a media type)." Rmhermen (talk) 02:40, 21 May 2018 (UTC)
@DarwIn: all black and white photos (intentional or not) should be in a black and white (sub)category. I suppose a seperate category for intentional black and white photos makes more sense for what you speak about, as a subcategory of black and white photos. Wouldn't be surprised if it already exists. - Alexis Jazz ping plz 07:15, 21 May 2018 (UTC)
@Tuvalkin: @DarwIn: I put everything back in Category:20th-century black and white photographs of Lisbon. (when time passes this could become more difficult, so I did it now) I considered putting them in Category:20th-century black and white photographs of Portugal instead (which also contains nothing but old photographs that are not black and white on purpose), but failed to see the point. - Alexis Jazz ping plz 07:49, 21 May 2018 (UTC)
The ratelimit (which hasn't been reversed yet) FUCKED me. Please excuse me while I go scream at someone. - Alexis Jazz ping plz 07:54, 21 May 2018 (UTC)
Thank you for clarifying that. I remember discussions from years ago where we should be avoiding placing "normal" photos under the BW tree, as it would simply duplicate the already existing tree (at least if subcats are used). But if this understanding has changed, that's perfectly OK with me. I'll start including that kind of cat in the (many) BW photos I use to upload.-- Darwin Ahoy! 12:12, 21 May 2018 (UTC)
I still fear this kind of cats will be used as a kind of "visual bags" for easily collecting photos from more general cats, such as "Lisbon in the 20th-century", seriously damaging the proper curadory of those subjects. And once the photo has fallen there, if it has not been categorized before, to find it will be literally like finding needles in haystacks. I myself find this to be a too high risk for maintaining that kind of categories, especially when to me those BW subcats are basically useless. But if that's just me thinking this way, never mind.-- Darwin Ahoy! 12:18, 21 May 2018 (UTC)
@DarwIn: What is "curadory"? I don't think I completely understand the issue. Per Commons:Categories: "The page (file, category) should be put in the most specific category/categories that fit(s) the page (not directly to its parent categories)". As I said above, a specific category for black and white photos that are black and white on purpose is a good idea if it doesn't exist already. You can also add files to a subcategory of Category:Undercategorised files of that's your concern. If your issue is something else, please elaborate. - Alexis Jazz ping plz 12:45, 21 May 2018 (UTC)
@Alexis Jazz: I mean curation. If you have 200 images to be sorted out in "Lisbon in the 20th-century", and then someone comes there and moves 190 of them to its subcat "20th-century black and white photographs of Lisbon", because they are BW and it's something very easily to do visually, that person has seriously damaged the process of proper curation of that content, in exchange to some rather pointless subcat about the photos being black and white, which is something almost nobody would care when they are looking for 20th-century material about Lisbon. That's what I mean: "20th-century black and white photographs of Lisbon", more than pointless, it's an hindrance.-- Darwin Ahoy! 12:58, 21 May 2018 (UTC)

Can somebody please add this watchlist notice?[edit]

Please add this watchlist notice. Thank you. Ping me back. Having fun! Cheers! {{u|Checkingfax}} {Talk} 04:38, 21 May 2018 (UTC)

@Checkingfax: Some questions:
  1. What is this
  2. Why
  3. How
  4. Who are you
  5. What's up with your sig
  6. that's about it.
- Alexis Jazz ping plz 07:10, 21 May 2018 (UTC)


Cat-a-lot is broken until further notice. Do not categorize more than 89 files per minute.

If you try to do any more, cat-a-lot will say it succeeded but it really didn't. More than 89 categorizations simply means anything over that will be silently dropped.

If you are not an admin or bureaucrat, everything you categorized over 89 files per minute during the last week has been dropped. Oops. You may want to take another look at everything you thought you had categorized.

I'm sorry I didn't find out sooner. Then again, finding out is in no way my responsibility. VisualFileChange seems to stall and force the user to go get a cup of coffee before it automatically continues. I haven't tested any other tools. - Alexis Jazz ping plz 08:58, 21 May 2018 (UTC)

By "silently dropped" you mean it simply removes the category, without moving it anywhere?-- Darwin Ahoy! 13:12, 21 May 2018 (UTC)

@DarwIn: Given that this is because of new limits for how many edits one can make in a given time and removing one category and adding another in Cat-A-Lot is typically one edit per file: No, "silently dropped" means just "nothing happens". --El Grafo (talk) 13:29, 21 May 2018 (UTC)
@El Grafo: Thanks, then it's not as bad as it could be. But still very annoying, indeed.-- Darwin Ahoy! 13:37, 21 May 2018 (UTC)

External link archiving and back-up's on Wikimedia Commons[edit]

There is an ongoing discussion taking place to add bot-archiving to media files 📁 on Wikimedia Commons, as link rot is a serious issue I would like to invite everyone interested to give their 2¢.

Comments imported from the English Wikipedia.

"@Donald Trung: - my bot WaybackMedic can add archives (example). However, dead links need to be pre-marked, such as with a {{dead link}} template. The bot doesn't have a dead-link checker so it needs to know which link(s) on a page need saving. If Faebot can mark them, my bot can save them. -- GreenC 02:18, 20 May 2018 (UTC)

@GreenC: that sounds good, can I ping and continue this conversation at the village pump of Wikimedia Commons? --Donald Trung (talk) 06:17, 20 May 2018 (UTC)"

@GreenC:, I moved the discussion here so it could be discussed and scrutinised by the community of Wikimedia Commons and helpful suggestions could be given by people who work with external links 🔗 every day. @:, as you're one of the most technical users on this wiki and your bot might have to be used. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 09:24, 21 May 2018 (UTC)

I have tested for dead links, this is how Category:Uploads by Fæ with linkrot got populated (as a one-off) and it's how Category:Faebot analysed duplicates ready for review partly works to recommend which duplicate photos should be kept (runs every day). However it is read intensive as these either look at each image page's wikitext for urls, or use the pywikibot links query, which probably amounts to the same thing in processor or transaction times.
The obvious way to speed this up and make it apply across the whole of Commons, would be to use a local data dump of all wikitext pages from the files namespace, which then avoids lots of internet connections. A second step would be to "remember" which domains are returning 404 errors consistently, and skip checking these individually. This would save a huge amount of potential wait time, as returning 404 errors, or similar header messages, takes seconds each time.
Personally, I'm unsure about the case for this being a good use of bot-writing time. It also looks like the sort of "virtuous wikifairy behind-the-scenes" work that is worth getting a grant for, if only to help cover some obvious costs and avoid being personally out of pocket. Having a grant to contribute to, say, an additional terabyte drive, or cover a couple of months of higher broadband connection before migrating a working bot to the cloud, is somewhat more meaningful than a barnstar template or being mistaken for a paid WMF dev. -- (talk) 09:39, 21 May 2018 (UTC)
@: Just to be perfectly clear: I assume you are aware dumps are publicly available for download? I'm not sure if you are, because if you were it would seem odd to refer to it the way you did. But I'm probably wrong. - Alexis Jazz ping plz 09:51, 21 May 2018 (UTC)
Yes. For a long time dumps were not running, but seem to be regular at the moment. Fortunately pywikibot is written to be able to handle dumps with reasonable ease, so what works live can be adjusted to work with local dumps. In terms of volunteer time, getting it to work well/smart is more complex that it may first appear, especially if this is going to become a useful housekeeping task that is alert for new links being added to the collection and will regularly look back over the entire collection. A ~100MB dump may not seem large, but it is large if someone on a home broadband connection is sucking down fresh daily dumps as soon as they come out. If this is a cloud task, it may be possible to work this entirely differently, but that would need investigation unless there is a best practice established from Wikipedia (I have no idea, as I don't follow those projects, life's too short). -- (talk) 10:01, 21 May 2018 (UTC)
I don't really care about barnstarts or other things that "the community" sees as important, and I don't think that GreenC wants to do it to get money 💴. I personally care about content and how this content "ages" and a serious issue is future attribution and the discoverability of new content or context for future 🔮 historians. A major issue (with Flickr files at least) is that when licenses change other people might mistake free files for copyrighted. ArchiveBots preserve the source and remove any copyright ambiguity for future reference. I'm not sure how Wikipedia projects do it so it might be best to ping an actual Wikimedia employed developer and ask for their opinion on this, judging by many past posts on this page I'm well aware that the Wikimedia Commons village pump is on the watch list of at least a few dozen of them. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:43, 21 May 2018 (UTC)

Well, three ways to retrieve wikitext: download the dump as mentioned (released about once monthly). Use the API (most common). Or connect to the database with SQL queries via a Tools account. The later is generally the fastest but requires running the program on WMF servers; not such a bad thing to be hosted on the same LAN and collocation as the Wikipedia servers. This is how IABot does it. IABot has a dead link checker (PHP) and it's available to download, but I can't say much about it as I don't PHP. You might ask Cyberpower678 about it if interested. Running a dead link checker on a regular basis is not trivial which is why I don't do it, there are a lot of issues to deal with (intermittent outages, paywalls, bot blockers, etc). -- GreenC (talk) 13:16, 21 May 2018 (UTC)

I checked with Cyberpower678 and the dead link checker is it is standalone PHP .. though some of the fail safe logic is built into IABot so it's not out of the box. CP also said IABot will get to Commons eventually. -- GreenC (talk) 13:25, 21 May 2018 (UTC)