On this page, we give you a few hints how to get the high-resolution images all the same.
But beware: Only use the techniques presented here if you can demonstrate clearly that the image in question is in the public domain (i.e., that it is not copyrighted), or that it is freely licensed, and if you are sure not to break any local laws by doing so.
- 1 Zoomify
- 1.1 Dezoomify.py
- 1.2 Dezoomify V2
- 1.3 Zoomify.php
- 1.4 dezoomify.rb script
- 1.5 Google Highres images
- 1.6 National Gallery collection
- 1.7 On-demand generation
- 1.8 Monitoring HTTP requests
- 1.9 References list
Zoomify is a Flash program that offers zooming into high-resolution images. To reduce the network traffic and to improve response time, it doesn't download the full high-resolution image. Instead, the image is broken up into tiles: small rectangular areas, each small enough to be quickly loaded. From the zoom level and the visible area, the program calculates which tiles it needs to load to display the visible part of the image in a higher resolution, and then loads only those tiles.
In order to load the tiles, these tiles must be accessible on the Internet somewhere; in fact, they must be on a server in the same domain that sent you the Flash program. It follows that the full high-resolution image is accessible, but unfortunately only as a (possibly large) number of separate image files, one for each tile.
The Dezoomify Python script takes the URL of a page containing a Zoomify image, scrapes the necessary information, asynchronously downloads the image tiles at the maximum zoom level and losslessly (that is, without any re-compressing of the JPEG image and the resulting quality loss) stitches them together into a single image. Python 3 must be installed. Other included features:
- batch mode for downloading several images,
- optionally download a non-maximum zoom level of the image,
- manually specify the Zoomify base directory, if necessary.
An old version of the script, that uses Python 2 and Python Imaging Library instead of jpegtran, can be found here.
Just enter the URL of a page containing a zoomify object (example) into the form.
- Requires a recent web browser (ie not Internet Explorer).
- In other browser than Firefox it is not possible to save the canvas by right-clicking on it. You will have to make screenshots...
- You can't choose the zoomlevel. The script downloads the images at the max zoomlevel.
- As result you get a PNG-file. For Uploading at Commons it's mostly usefull to convert it into a JPG-file.
Here is the source code.
Loading all tiles at once
A tool exists to bypass zoomify and to load all tiles at once. This generates a single web page that shows the whole high-resolution image, although it is still composed of individual tiles. But since they are all properly arranged side by side, you won't see that. This tool resides on our toolserver at the URL http://toolserver.org/~kolossos/image/zoomify.php
It takes two parameters:
- zoom=[1-8] This defines the zoom level for which you want to get the tiles. The higher the number, the larger the final image will be. Try the tool with level 5, if that fails, lower the parameter.
- path=URL This gives the path to the directory on the server where the tiles are stored. To get this path you can watch in the html of the page and looking for "zoomifyImagePath" or you look at your disk cache of your browser (URL: "about:cache" in firefox). See also Monitoring HTTP requests below.
- The first step of the script is to look at: $path/ImageProperties.xml to get width and height of the image.
- Then the script makes some calculations to generate a table with the images for the zoom step, and sends back a web page containing all these images. The script will not access the server for these images, It just sends your browser the generated web page that links to all the tiles at the zoomify server. Your browser will then load all the tiles from that server.
A site that uses the zoomify program is . Click "zoom" on the first image on the right to see it. Note that the title bar of the image display says "http://www.dsloan.com/Auctions/A22/zoomer.php?file=zoomify/kendall-nebel-01". Evidently, the Flash program loads the tiles from http://www.dsloan.com/Auctions/A22/zoomify/kendall-nebel-01 To view that image as a whole in the highest available resolution, you would thus enter the following URL in your browser:
The result is a page containing all the tiles in the proper layout. That page takes a while to load, so be patient. Note that the title of the page tells you what the maximum zoom level is, so if it indicates that even higher resolutions than zoom level 5 are available, try increasing the zoom parameter.
Now you can view the image as a whole, but it's still only individual tiles layed out. How can you save this display now as a single file? There are two ways:
- If you're using Firefox, you might consider using a plug-in such as pagesaver or some other, similar tool, e.g. FireShot. With that, you can take a snapshot of the whole web page (even including those parts that are not visible in the browser window) at the screen resolution.
- Alternatively, you could save the whole web page locally, which would also save the tiles, and then assemble all the tiles in a graphics program. This is a tedious process to do manually when there are many tiles, but it avoids the detour through a screenshot, which may lose quality. If you're using the GIMP (version 2.4 or higher), there is a Scheme script to automatically assemble all the tiles for you: de-tile.scm. Copy this script into the "script" folder of the GIMP and refresh the scripts (or restart the GIMP). Then open the top-left tile in GIMP (the one named "5-0-0.jpg"). In the "Filters→Combine" sub-menu, you should have an entry "De-Tile". Select that and wait. The script will load all the other tiles and assemble them into one single image. When it's done, save the resulting image.
(Note: do not upload this example image. It already exists at the Commons as Image:Nebel Mexican War 01 Battle of Palo Alto.jpg.)
Ruby + ImageMagick script that grabs and stitches Zoomify images: http://gist.github.com/59636
Google Highres images
The workflow is the same as above.
National Gallery collection
You can browse the collection of the National Gallery online via a panning/scrolling/zooming widget. However, looking at paintings through a tiny porthole (even with the "full screen" view) is limiting.
These tools lets you view paintings that are part of British and European cultural heritage on your own terms. Indeed, the aims of the Gallery itself support this view:
The Gallery aims to study and care for the collection, while encouraging the widest possible access to the pictures.
Only Bash and ImageMagick required: zoom.sh. Very simple and almost self-contained.
Some zoom utilities do not use tiling, but ask the server to generate them a crop of the original high-resolution image at the resolution, the size and the coordinates the user is viewing the image currently. In such cases, it may be a bit more difficult to get at the full high-resolution image, because the image file(s) themselves may be inaccessible from the Internet. Only the server-side program generating crops from the file can be accessed.
Such server-side programs need to take a few parameters to know what to return. Typically, these parameters are:
- the desired resolution
- the width and the height of the image that should be generated
- the offset (x- and y-coordinates) within the full-resolution image from which the crop should be generated.
If the URL for the server-side program can be determined, and the parameters can be identified, it is then usually possible to manually query the server (using hand-written URLs) to see what its limits are. Some servers allow setting x and y to zero and the width and the height to arbitrarily large values, so that might be a way to get the full high-resolution image. Others place limits upon the maximum width and heights; in this case, one needs to get individual tiles and combine them as above.
Example 1: CONTENTdm sites
CONTENTdm is a digital collection management software that uses on-demand generation of zoomable images. A site using this software is the C. R. Savage Collection at the Brigham Young University. Click on the image to zoom in. But where does this image come from?
- Click "View source" in your browser. Examine the HTML source. You'll find a "<form name="mainimage" action="">", containing an "<input type="image"" with the source URL
- Note the parameters for the scale, width, height, x, and y. There is also a "thumb" parameter set to 1.
- The server is http://contentdm.lib.byu.edu/, as seen in the URL of the page.
- Let's try this: enter the URL
- This defines a scale of 100%, width and height of 3000px, x and y as zero (i.e., from the top-left corner) and sets the "thumb" parameter to zero. Let's see what we get.
- We're getting closer. It gives us a 766kB file of 3000×3000 pixels, but it isn't large enough yet to show the full original. Let's step up the dimensions to 8000×6000px. That file will likely be large (if we get anything at all), so be prepared to wait a bit if you try this.
- Voilà, there we are: we've got the full picture as a 7462×6000px image (3.7MB). Save it, then crop it, and you're done. (Crop away at least the blue border. We neither need nor want that.)
- We could have determined the (approximate) size of the full high-resolution version also from the start: the original zoom level was shown as 8%, and the thumbnail was 598×507px large. At 100%, the image would thus need to be about 7475×6338px...
(Again, please don't upload this example image. It already exists at the Commons as Image:First Presidency and Twelve Apostles 1898.jpg.)
Another well-known site using on-demand generation is the David Rumsey Map Collection. For this site, there are two ways to get full high-resolution images. We'll illustrate both techniques using the example image here (an old map of a part of Chile). Please don't upload this image, it already exists at the Commons as Image:Chile.Pissis-A-rioloa.djvu.
The first technique gets you a high-resolution JPEG image, using the techniques shown in Example 1 above. If you examine the URLs for the images loaded by their viewer (see Monitoring HTTP requests below), you'll discover that it uses a URL like this:
Again, note the parameters for x, y, width, height, and the zoom level. When you zoom in, you'll notice that the level parameter decreases, and is zero at the highest level. Some experimentation will quickly yield the following URL, which gives a full high-resolution JPEG image of the whole map:
Right-click and choose "Save image as..." to save the file on your computer, then crop away the borders.
The second technique gives you the original MrSID file of the map. The Rumsey Collection includes in the left sidebar a direct link to this file. Unfortunately, this link doesn't work because it goes to a non-existing URL "http://www.davi". It used to work once, but either there's some error in their server-side software, or they've disabled these links intentionally. If that link works for you, fine. If not: the full link is still in the HTML source of the page. Open the HTML source in your browser (it should have a "View source" menu item somewhere) and scroll down. You'll discover that the actual URL given is "http://www.davi drumsey.com/rumsey/d ownload.pl?image=/D0 052/0734001.sid", i.e., it contains blanks. Copy this malformed URL into your browser's address bar with the extra blanks removed such that it reads http://www.davidrumsey.com/rumsey/download.pl?image=/D0052/0734001.sid and hit return. Your browser should now ask you where to save the file. Once you've got the MrSID file, you can convert it using e.g. IrfanView (slow) or using the tools provided by LizardTech, the company that developed the MrSID format. You cannot upload MrSID files to the Commons because it's a proprietary format and we allow only free file formats. You will have to convert the image to either JPEG or DjVu. Note that MrSID files use more advanced compression techniques (wavelet compression) than JPEG files, so converting a MrSID file into a JPEG may yield a huge JPEG file. The Commons has a maximum file size for uploads of 100MB, but JPEGs that large are really unwieldy and may be hard to handle. Try to keep the file size much lower, around a maximum of 10-20MB.
Monitoring HTTP requests
How can you find out through what URL an image is loaded? Unlike static linking, these dynamic zoom tools have the side effect that the URL of the image (or images) they load is hidden from the user. Sometimes, they are visible in the HTML source of the web page. If not, another approach to determine these URLs is to monitor the network traffic. All these client-side tools need to make HTTP requests to their server to get the images or the tiles.
- Using a local proxy, it is possible to obtain a log of all requests made. Privoxy is a reliable freeware proxy that (amongst a lot of other useful features) also has a request log showing each request URL. Set up your browser to go through that proxy, and examine the request log to find the image URL.
- Alternatively, there are tools to monitor a browser's traffic directly, such as the HttpFox plug-in for Firefox.
- In Firefox, you can also examine the URLs of loaded images through the "Tools→Page Info" menu, "Media" tab. Firefox lists all the images, with their URL and thumbnails.