User:Arnomane/Image usage

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Note: This script is now unmaintained and there is a much more powerfull and improved rewritten web based solution at http://tools.wikimedia.de/~daniel/WikiSense/CheckUsage.php created by User:Duesentrieb. Take this script here as an quick and dirty example how the basic principle of the old "check-usage-idea" (including Avatars script but not Duesentriebs) works and feel free to improve and to adopt it if you like it.

Here is a small quick and dirty Bash shell script written by myself that looks for the usage of a certain image of the Wikimedia Commons within the different Wikimedia projects and its language sections.

You need a UNIX compatible system (e.g. Linux), a Bash compatible shell, wget, sed and grep (of course also a working internet connection :p) for running it.

The usage is quite simple:

  • Copy "check-usage.sh" and "projects.txt" out of the source code of this page (Please go on edit and copy the script from the page source as the MediaWiki software might does not display the code correctly).
  • Put them both into the same directory.
  • Make "check-usage.sh" executable.
  • Open a terminal and change into the directory where the script is stored.
    • If you want to look for the usage of a file with the Commons in the various projects enter ./check-usage.sh -u "$IMAGE" (replace $IMAGE with the name of the image you want to look for; Note: Don't enter the namespace prefix "Image:").
    • If you want to find duplicates that are not marked with the NowCommons-tag enter ./check-usage.sh -d "$IMAGE".
  • If you copy the "projects.txt" from here or want to enhance/edit it please make sure that it does not contain any whitespace and that the last line is a blank line that only contains a newline (return).

Known Bugs:

  • NowCommons-tagged local images don't get excluded at -u option.

Code of "check-usage.sh"[edit]

#!/bin/bash
# Created by Daniel Arnold, released within the public domain.
# I know this script is very very ugly but it works for me. ;-)

# The command line option
option=$1

# Our picture we want to look at (needs to be given as "imagename" on command
# line).
file=$2

# Help in case someone didn't enter the right syntax.
# (The Option must be either -d or -u and the file name must be given that our
# program works.)
if [ "$option" != "-d" -a "$option" != "-u" -o -z "$file" ]; then

  echo 'This is a small shell script that looks for the usage of pictures from'
  echo 'the Wikimedia Commons in various Wikimedia projects (list defined by'
  echo 'projects.txt).'
  echo ''
  echo 'Usage: ./check-usage.sh [Option] [File]'
  echo '(You need to be in the directory where the script is located; [File] is'
  echo 'the file name witout leading "Image:")'
  echo ''
  echo 'Options:'
  echo '-u : Checks usage of [File] in the various projects in the list.'
  echo '-d : Checks duplicates of [File] in the various projects in the list.'
  echo ''
  echo 'Note:'
  echo 'The "projects.txt" file needs to be in the same directory as the script'
  echo 'and must not contain any whitespace and its last line needs to be a'
  echo 'blank line that only contains a newline (return).'
  exit
fi

# The main loop that reads all Wikimedia projects out of projects.txt and looks
# in all of them for usage of our picture.

# Check if the file exits; the "image history only exits if an image exists
# with that name.
check=`wget --quiet --output-document=- \
"http://commons.wikimedia.org/wiki/Image:$file"`
check=`echo $check | grep --count "<h2>Image history</h2>"`

if [ $check -eq 0 ]; then
  echo "The file \"$file\" does not exist at the Wikimedia Commons."
  echo "Please enter an existing file name."
  exit
fi

while read project; do

  # Grab the page.
  page=`wget --quiet --output-document=- "http://$project.org/wiki/Image:$file"`

  # Remove everything outside <!-- start content --> and <!--end content -->.
  content=`echo $page | sed 's/^.*<\!--\ start\ content\ -->//g' \
| sed 's/<\!--\ end\ content\ -->.*$//g'`

  # Look if the image is really from the Wikimedia Commons and not a local one
  # with the same name (but it doesn't exclude NowCommons-tagged local images).
  commons=`echo $content \
| grep --count "http://upload.wikimedia.org/wikipedia/commons"`

  # Only analyse if it is no local file and we set -u option. 
  if [ $commons -ne 0 -a "$option" == "-u" ]; then

    # Look for the articles that use that image.
    # As the usage list start at the last string with '</p> <ul><li><a href="'
    # it is essentially that this string only occours once, otherwise the
    # program fails.
    counter=`echo $content | grep --count "</p> <ul><li><a href=\""`

    # exclude pages with no usage reference.
    if [ $counter -ne 0 ]; then

      # At first truncate everything before our list...
      content=`echo $content \
| sed 's/^.*<\/p>\ <ul><li><a\ href\=\"/<li><a\ href\=\"/g'`

      # ...then truncate everything after our list.
      content=`echo $content \
| sed 's/\ <\/ul>\ <div\ class\=\"printfooter\">.*$//g'`

      # Finally print list as URL's on screen.
      echo $content | sed 's/<\/li>/\n/g' | sed 's/^.*href=\"//g' \
| sed 's/\"\ title\=\".*$//g' | sed '/^$/d' \
| sed "s/^/http\:\/\/$project\.org/g"
    fi
  fi

  # Look for the duplicates (the ones marked with NowCommons get ignored)
  if [ $commons -eq 0 -a "$option" == "-d" ]; then

    # Print the URL of the local duplicate
    echo "http://$project.org/wiki/Image:$file"
  fi

# nice trick I found in the web :-)
done < projects.txt

projects.txt[edit]

This list includes all Wikipedia language groups with more than 10'000 articles, Wikinews, Wikibooks and Wikiquote language groups with more than 1000 articles and the three multilanguage projects. For performance reasons not all language groups are included.

en.wikipedia
de.wikipedia
ja.wikipedia
fr.wikipedia
sv.wikipedia
pl.wikipedia
nl.wikipedia
es.wikipedia
it.wikipedia
pt.wikipedia
he.wikipedia
zh.wikipedia
bg.wikipedia
ru.wikipedia
uk.wikipedia
ca.wikipedia
da.wikipedia
eo.wikipedia
no.wikipedia
ro.wikipedia
sr.wikipedia
sl.wikipedia
fi.wikipedia
en.wikinews
de.wikinews
en.wikibooks
de.wikibooks
en.wikiquote
de.wikiquote
species.wikipedia
meta.wikimedia
wikisource