Commons:Bots/Requests/InternetArchiveBot
Operator: Cyberpower678 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information) and Harej (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Tagging of dead links in the File namespace and adding archival URLs to them to mitigate link rot.
Automatic or manually assisted: Automatic
Edit type (e.g. Continuous, daily, one time run): Continuous
Maximum edit rate (e.g. edits per minute): approximately 60 edits per minute
Bot flag requested: (Y/N): Y
Programming language(s): PHP. Source code.
Harej (talk) 00:47, 19 November 2020 (UTC)
- Discussion
- The configuration for the different messages the bot uses is available on Toolforge. Feel free to make adjustments as necessary for use on Commons. Harej (talk) 00:55, 19 November 2020 (UTC)
- This seems to have a very wide scope and may have policy implications, such as in future years it may be normal to delete files which never had links archived at IA, and it ties the management/curation of Commons' collections to whether IA can continue to exist online though the WMF has never specifically had IA in its corporate sustainability strategy. As this has the potential to result in changes for all Commons images with links (it being a fact that all links must go dead eventually), is there an existing consensus for this how this project should work? Writing as someone who has mass added IA links due to link rot, but with narrow scope to certain link types. --Fæ (talk) 12:18, 20 November 2020 (UTC)
- Per Fæ's comment I have the feeling this should be discussed at another venue where more community feedback can be gained. Can anybody arrange that? --Krd 18:41, 22 November 2020 (UTC)
- FYI @Kaldari: --Fæ (talk) 22:32, 22 November 2020 (UTC)
- Fæ, by default, the bot would only replace links that are already dead. Irrespective of the longevity of the Internet Archive, replacing a link that is currently dead with an archive that is currently available seems reasonable to me. Further, the bot is programmed to operate agnostic of a particular archival provider, and indeed has linked to other archives. So even if something happened to the Internet Archive, the bot could simply continue with a replacement archival service. I am not sure I understand the connection to Commons' own curatorial practices. In any case, the bot currently runs without issue on over 40 Wikimedia wikis, and the bot is highly configurable as well. Cyberpower678 and I are happy to work with the community on making sure the bot is a constructive participant. Harej (talk) 23:23, 24 November 2020 (UTC)
- This all seems very good, in addition it would be useful if a process using these methods were to identify, say, in-use media or media from highly used sources, and ensure those external links are available on backup archives somewhere in case they are needed in the future.
- My comment is not that there should be a proposal, but it would be wise to have an announcement post on the Village Pump to see what questions the Commons community may have. Bots/Requests is a technical area and very, very few contributors follow these request pages. As the scope of this project could include tens of millions of image pages and their associated template use, some additional feedback as early as possible could raise questions that we can't think of on our own. --Fæ (talk) 10:20, 25 November 2020 (UTC)
- I do think there should be a proposal or request for feedback, not just an announcement, for exactly the reasons you outlined. As an example question, if there is a dead link somewhere which is part of the file attribution, should it be replaced or should it remain as initially set by the copyright holder? --Krd 10:28, 25 November 2020 (UTC)
- Thanks for the distinction. I was thinking of a word that was not 'proposal', as the bot is already well defined. Requesting feedback makes more sense, which could be done in a less than strictly RfC way. --Fæ (talk) 12:24, 25 November 2020 (UTC)
- I left a message on the village pump to solicit input on this page. As for preserving attribution, the bot is configured (by default) to append archival links to broken links, rather than replace them outright. So the archival link would be an annotation on top of the original link. Harej (talk) 23:22, 25 November 2020 (UTC)
- Thanks for the distinction. I was thinking of a word that was not 'proposal', as the bot is already well defined. Requesting feedback makes more sense, which could be done in a less than strictly RfC way. --Fæ (talk) 12:24, 25 November 2020 (UTC)
- I do think there should be a proposal or request for feedback, not just an announcement, for exactly the reasons you outlined. As an example question, if there is a dead link somewhere which is part of the file attribution, should it be replaced or should it remain as initially set by the copyright holder? --Krd 10:28, 25 November 2020 (UTC)
- @Krd and Harej: Input isn’t needed. This already has consensus per this existing thread.—CYBERPOWER (Chat) 19:23, 6 December 2020 (UTC)
- Support, I think implementation of this bot is an excellent idea to combat linkrot. — Jeff G. ツ please ping or talk to me 02:39, 26 November 2020 (UTC)
- Support; the bot already works well on other projects. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:44, 1 December 2020 (UTC)
- Support, the community had already voted (without ANY opposition) to archive external links and "Commons:Archive external links" was created as a provisional page because I had asked the operated of this bot to run it on Wikimedia Commons but (s)he said that the bot wasn't ready for Wikimedia Commons yet. I would advise the nominator to look at the wonderful page "User:Fæ/Wayback" and use the "{{Wayback}}" template as it's already established on Wikimedia Commons. Note: Just for clarity I'd like the record to show that I am against any deletion purely based on linkrot and support this bot to save free files, not to delete them. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 22:17, 1 December 2020 (UTC)
- Question Could we perform a small test run on Commons. All current edits from the bot are imported edits from other Wikis? --Schlurcher (talk) 09:13, 3 December 2020 (UTC)
- That makes much sense, please do a small test run. --Krd 16:12, 12 December 2020 (UTC)
- Comment, this bot could potentially be used to discouraged spammers, for example if we would start to supplement all existing external links with archived ones and if this only appears somewhat similar to the English-language Wikipedia like "External link page name (Archived from the original)", then spammers will be less likely to actually post links. Though now that I think about it this suggestion would probably be more useful for a Wikipedia than here. But there might be some value in immediately archiving links rather than waiting until they are "dead", especially since the copyright license on a web page can change willy nilly, this is a feature of some websites like Flickr already. "Irrevocable" is only a suggestion there. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:48, 18 December 2020 (UTC)
- @Donald Trung: Once all the dead links in filespace are archived, then it can move on to archiving the live links there. — Jeff G. ツ please ping or talk to me 18:33, 15 January 2021 (UTC)
- @Cyberpower678 and Harej: Please make a test run. --Krd 11:06, 20 December 2020 (UTC)
- Krd I am picking this back up following the holidays. I will launch a test run imminently. Harej (talk) 18:50, 8 January 2021 (UTC)
- The bot has been running since yesterday; it has not made edits yet. I will stop the test once around 20 edits are made. Harej (talk) 16:27, 9 January 2021 (UTC)
- The bot was down for a few days for maintenance, but went back up this morning. I am monitoring the bot's edits. Harej (talk) 18:21, 15 January 2021 (UTC)
- The bot has been running since yesterday; it has not made edits yet. I will stop the test once around 20 edits are made. Harej (talk) 16:27, 9 January 2021 (UTC)
- Krd I am picking this back up following the holidays. I will launch a test run imminently. Harej (talk) 18:50, 8 January 2021 (UTC)
As far as I understand all issues have been addressed, and the test run looks good, so this should be called approved. --Krd 08:17, 19 January 2021 (UTC)