User talk:Faebot/GLAM dashboard

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Ideas for additional features[edit]

Hi Fae, thanks a lot for providing this tool!

Here some ideas for additional features:

  • Not only present statistics of users editing the file description pages on Commons, but also of users including images in Wikipedia articles (which is at least as valuable).
  • When using some of Magnus' tools in the context of the uploads, I often also went to look for the pictures that were used the least often - in order to see whether I could insert them in a Wikipedia article. Maybe we could think of something similar here as well - what would be needed is an algorithm that recommends pictures to be checked in view of their inclusion in Wikipedia (e.g. pictures that haven't been viewed before; pictures that are used in different Wikipedias, but not in my language version, etc.)

Cheers, Beat Estermann (talk) 10:03, 25 November 2014 (UTC)[reply]

1. Users including images in wp articles (all languages)
A way of doing this is to look at globalimagelinks on an image, then sniff out the page in a limited way, say only the last 100 edits, and see which of these diffs appears to be the first insertion of the image. The first step is already done, as the usage report is a "simple" SQL query that returns all wp usage links. The second step can be done fairly effectively using the API to return the page history (this does not exist in the SQL tables), however it involves some painful parsing to apply to all languages. I'll think about it as there may be alternatives such as using logs, though this might end up being slower.
2. Images to work on list.
A simple but useful report would be images used once only in article main space, picking 24 at random.
Here's an example random list of 10 from the Wellcome Images that fit this profile:
Example list
File Single usage
fr:Victor_Cassien
en:Florence_Nightingale
File:AIDS prevention advertisement from Mexico Wellcome L0054854.jpg hr:Homoseksualnost
hy:Պիրոպլազմներ
en:Frederic_Hervey_Foster_Quin
en:Joseph_Barnard_Davis
ru:Скитала
en:Battle_of_Yangxia
en:James_Manby_Gully
meta:Wikimedia_Blog/Drafts/100,000_Wellcome_Images_of_the_history_of_medicine
If this seems worthwhile, it could be implemented this weekend. -- (talk) 17:55, 27 November 2014 (UTC)[reply]
I just made the test myself for the first 3 images:
Aqueduc de Roquefavour:
  • Added a category
  • file name misleading (it is part of canal de Marseille, and not crossing it)
  • Corrected metadata information
  • Corrected caption in French article using the image
  • I didn't insert the image elsewhere (Victor Cassien has only a WP article in French; there is another, photography-like painting of the acqueduct from mid-19th century)
Florence Nightingale:
  • Added the picture to 2 further articles (there are many other articles it could be added to)
  • Added a further picture to the French article on Florence Nightingale (there are quite many pictures on Commons, so the illustration of the article in many other language versions could be improved)
AIDS prevention ad:
  • Added a category
  • Added the picture to 2 further articles (there are many other articles it could be added to)
So, I guess the algorithm is good enough to give it a try. Maybe you could also provide another list with images that have no more than one category. -- Cheers, Beat Estermann (talk) 08:31, 28 November 2014 (UTC)[reply]
First one (single mainspace use files) now rolled out. I'll look at adding the low category count one; probably by picking out ten with the lowest category count as many projects have unpredictable bucket categories added (such as 'categorization needed' or 'check needed' maintenance cats).
✓ Done
See example at Commons:Zentralbibliothek Zürich/reports/improvement. -- (talk) 14:39, 28 November 2014 (UTC)[reply]

Rearranging deckchairs...[edit]

As can be seen from Category:Collection of West Midlands Police Museum/Reports/largest the bot is making regular updates, even though no files have been changed. Could some sort of check be put in place, to prevent this, please? Andy Mabbett (talk) 19:56, 10 December 2014 (UTC)[reply]

I'm on a wikibreak due to real life stuff. I may be able add this in a couple of weeks (a simple name check). I have observed this, swapping the images about related to what order they are in the underpinning tables. Lokal is tweaking the code as well, so he might get to this and we can merge changes later. Anyone interested can clone and fiddle the source at http://github.com/faebug/batchuploads/blob/master/reportGLAMdashboard.py. -- (talk) 20:51, 10 December 2014 (UTC)[reply]
...
While waiting on a phonecall, I added the following to Faebot's version (not on github yet, this does not work from my old macmini):
imgs  = set(re.findall(r"File:[^\|\]]*\(jpe?g|JPE?G|og[gv]|OG[GV]|svg|SVG|tiff?|TIFF?|gif|GIF)", html))
rimgs = set(re.findall(r"File:[^\|\]]*\(jpe?g|JPE?G|og[gv]|OG[GV]|svg|SVG|tiff?|TIFF?|gif|GIF)", report))
if imgs.issubset(rimgs) and rimgs.issubset(imgs): return
This looks to see if all images in the new report (report) are the same as currently on-wiki (html), then skips putting the report if they are. The check is run on every report apart from categories, volunteers and the index report. I doubt it's perfect, consider this a soak test until I have more time. I rather like the ability of Python to handle sets, it can solve some problems rather neatly and takes care of optimization for you. -- (talk) 15:22, 12 December 2014 (UTC)[reply]
I have just noticed that this amendment seems to have stopped updating some pages, such as the usage page for the Wellcome. I have tabbed it out until I have more time to play around with the code. -- (talk) 15:16, 16 January 2015 (UTC)[reply]
Any chance you could revisit this, please? Andy Mabbett (talk) 22:52, 14 November 2015 (UTC)[reply]

Repository[edit]

The project namespace page redirects here for technical details. I don't see a public repository? If there was one I'd gladly try and send some patches, to feel less guilty for my proposals. :) --Federico Leva (BEIC) (talk) 11:15, 26 January 2015 (UTC)[reply]

Subscribe to https://github.com/faebug/batchuploads/blob/master/reportGLAMdashboard.py :-)
If you create some patches suitable to merge, you'll need to prompt me on how to go about it. I'm a newbie at cooperating using git. -- (talk) 12:13, 26 January 2015 (UTC)[reply]

Audio file names[edit]

Please add file names (or use a template which does so) for audio files. At present pages like Commons:British Library/Reports/Wildlife sounds/improvement, which lack them, are not as helpful as they could be. Andy Mabbett (talk) 22:52, 14 November 2015 (UTC)[reply]

SQL time-outs when using revision table[edit]

For a couple of weeks, the dashboard reports have been knocked out of action due to the most_edits report losing their SQL connection. This then breaks Faebot's run. For an example of an actual report see quarry:query/15561. Suggestions welcome! Debugging is a pain, as the SQL takes 2 hours to time-out and the error to be repeated and I have to use the Labs version rather than a local version to be able to connect to the database.

In the meantime, I may need to stub out this report from the run in order to get things restarted. -- (talk) 17:39, 17 January 2017 (UTC)[reply]

The problem has not been fixed, but time-outs for the most_edits report will no longer cause the run to halt, just the affected report to fail to be created. There are several potential causes, but as this is an apparent rare event and may be down to categorization oddities or exceptional use of institution space, I'm not going to plan to investigate it further at the moment. -- (talk) 20:52, 17 January 2017 (UTC)[reply]

Dashboard reporting not running[edit]

Hi @: hope you're doing well. I'm collaborating with the Cleveland Museum of Art and we're hoping to get GLAM dashboard reports up and running, but it looks like Faebot hasn't updated any dashboard pages since Sept 2018. Wanted to see if you're aware of this and if dashboard reporting will still be available. Thanks, ~Kevin Payravi (talk) 22:44, 13 July 2019 (UTC)[reply]

@Kevin Payravi: Hi. I was vaguely aware of it, along with a more popular report that stopped at the same time. Trying to check the script on labs last week, I discovered that it was prompting me for an admin password (after logging in). Not sure what that's about, so I may need to raise a phabricator ticket to address it. The WMF did change a few things about the database table structure, and if so, something that is fixable with a tweak to the SQL. However if the issue is significant time-outs, that may be more tricky.
I'm about to go on holiday, so will probably not look at this for a few weeks. I suggest not relying on Faebot getting this working soon, so you may want to reach out to other GLAM projects to see what the best suggestions for dashboards currently are. -- (talk) 19:17, 17 July 2019 (UTC)[reply]
@: Sounds good, thanks for the info. Have a great holiday. ~Kevin Payravi (talk) 05:57, 19 July 2019 (UTC)[reply]