Note: This script is now unmaintained and there is a much more powerfull and improved rewritten web based solution at http://tools.wikimedia.de/~daniel/WikiSense/CheckUsage.php created by User:Duesentrieb. Take this script here as an quick and dirty example how the basic principle of the old "check-usage-idea" (including Avatars script but not Duesentriebs) works and feel free to improve and to adopt it if you like it.
Here is a small quick and dirty Bash shell script written by myself that looks for the usage of a certain image of the Wikimedia Commons within the different Wikimedia projects and its language sections.
You need a UNIX compatible system (e.g. Linux), a Bash compatible shell, wget, sed and grep (of course also a working internet connection :p) for running it.
The usage is quite simple:
- Copy "check-usage.sh" and "projects.txt" out of the source code of this page (Please go on edit and copy the script from the page source as the MediaWiki software might does not display the code correctly).
- Put them both into the same directory.
- Make "check-usage.sh" executable.
- Open a terminal and change into the directory where the script is stored.
- If you want to look for the usage of a file with the Commons in the various projects enter ./check-usage.sh -u "$IMAGE" (replace $IMAGE with the name of the image you want to look for; Note: Don't enter the namespace prefix "Image:").
- If you want to find duplicates that are not marked with the NowCommons-tag enter ./check-usage.sh -d "$IMAGE".
- If you copy the "projects.txt" from here or want to enhance/edit it please make sure that it does not contain any whitespace and that the last line is a blank line that only contains a newline (return).
- NowCommons-tagged local images don't get excluded at -u option.
Code of "check-usage.sh"
#!/bin/bash # Created by Daniel Arnold, released within the public domain. # I know this script is very very ugly but it works for me. ;-) # The command line option option=$1 # Our picture we want to look at (needs to be given as "imagename" on command # line). file=$2 # Help in case someone didn't enter the right syntax. # (The Option must be either -d or -u and the file name must be given that our # program works.) if [ "$option" != "-d" -a "$option" != "-u" -o -z "$file" ]; then echo 'This is a small shell script that looks for the usage of pictures from' echo 'the Wikimedia Commons in various Wikimedia projects (list defined by' echo 'projects.txt).' echo '' echo 'Usage: ./check-usage.sh [Option] [File]' echo '(You need to be in the directory where the script is located; [File] is' echo 'the file name witout leading "Image:")' echo '' echo 'Options:' echo '-u : Checks usage of [File] in the various projects in the list.' echo '-d : Checks duplicates of [File] in the various projects in the list.' echo '' echo 'Note:' echo 'The "projects.txt" file needs to be in the same directory as the script' echo 'and must not contain any whitespace and its last line needs to be a' echo 'blank line that only contains a newline (return).' exit fi # The main loop that reads all Wikimedia projects out of projects.txt and looks # in all of them for usage of our picture. # Check if the file exits; the "image history only exits if an image exists # with that name. check=`wget --quiet --output-document=- \ "http://commons.wikimedia.org/wiki/Image:$file"` check=`echo $check | grep --count "<h2>Image history</h2>"` if [ $check -eq 0 ]; then echo "The file \"$file\" does not exist at the Wikimedia Commons." echo "Please enter an existing file name." exit fi while read project; do # Grab the page. page=`wget --quiet --output-document=- "http://$project.org/wiki/Image:$file"` # Remove everything outside <!-- start content --> and <!--end content -->. content=`echo $page | sed 's/^.*<\!--\ start\ content\ -->//g' \ | sed 's/<\!--\ end\ content\ -->.*$//g'` # Look if the image is really from the Wikimedia Commons and not a local one # with the same name (but it doesn't exclude NowCommons-tagged local images). commons=`echo $content \ | grep --count "http://upload.wikimedia.org/wikipedia/commons"` # Only analyse if it is no local file and we set -u option. if [ $commons -ne 0 -a "$option" == "-u" ]; then # Look for the articles that use that image. # As the usage list start at the last string with '</p> <ul><li><a href="' # it is essentially that this string only occours once, otherwise the # program fails. counter=`echo $content | grep --count "</p> <ul><li><a href=\""` # exclude pages with no usage reference. if [ $counter -ne 0 ]; then # At first truncate everything before our list... content=`echo $content \ | sed 's/^.*<\/p>\ <ul><li><a\ href\=\"/<li><a\ href\=\"/g'` # ...then truncate everything after our list. content=`echo $content \ | sed 's/\ <\/ul>\ <div\ class\=\"printfooter\">.*$//g'` # Finally print list as URL's on screen. echo $content | sed 's/<\/li>/\n/g' | sed 's/^.*href=\"//g' \ | sed 's/\"\ title\=\".*$//g' | sed '/^$/d' \ | sed "s/^/http\:\/\/$project\.org/g" fi fi # Look for the duplicates (the ones marked with NowCommons get ignored) if [ $commons -eq 0 -a "$option" == "-d" ]; then # Print the URL of the local duplicate echo "http://$project.org/wiki/Image:$file" fi # nice trick I found in the web :-) done < projects.txt
This list includes all Wikipedia language groups with more than 10'000 articles, Wikinews, Wikibooks and Wikiquote language groups with more than 1000 articles and the three multilanguage projects. For performance reasons not all language groups are included.
en.wikipedia de.wikipedia ja.wikipedia fr.wikipedia sv.wikipedia pl.wikipedia nl.wikipedia es.wikipedia it.wikipedia pt.wikipedia he.wikipedia zh.wikipedia bg.wikipedia ru.wikipedia uk.wikipedia ca.wikipedia da.wikipedia eo.wikipedia no.wikipedia ro.wikipedia sr.wikipedia sl.wikipedia fi.wikipedia en.wikinews de.wikinews en.wikibooks de.wikibooks en.wikiquote de.wikiquote species.wikipedia meta.wikimedia wikisource