Commons:Extracting images from PDF

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Other languages:
"COM:PDF" redirects here. For Commons guidelines on the use of PDF format, see Commons:Project scope#PDF and DjVu formats.

This page explains how to extract images from PDF files. Some PDF files have whole pages as images, some have images separately.

Extract PDF pages as images[edit]

Pages in a PDF file are often stored as images, in scanned books, for example.

  • Use XPdf command line tools pdfimages, pdftopng, pdftoppm, pdftops or XPdf Reader (File->Save Image).
  • Use freely available programs PDF-XChange Viewer (File -> Export -> Export to Image) or STDU Viewer (File -> Export -> to image).
  • To SVG: use pdf2svg (Linux: pdf2svg) to convert to an SVG if the entire PDF file should be used as an image, e.g., if it is a diagram generated by some program.
  • PDFCreator can export PDF in several bitmap formats.
  • ImageMagick's convert can split a PDF into single images of pages; it's free.

Extracting images from PDF[edit]

PDF files can contain images that are actually at a higher resolution than the “100%” size of the document. Possible ways to extract images from PDFs include:

  • PDF Aid – online image extractor
  • pdfimages command-line tool in the xpdf package.
    • Use the -j option to losslessly extract JPEG-compressed images, or -all to losslessly extract all images in their original file type. For example: pdfimages -all '/path/to/your.pdf' ./output-filename-prefix
  • Nitro PDF has a function to pull all images out of a PDF file at full resolution, and you can choose the output format (jpg, png, etc). However, it won't work if the PDF is password-protected.
  • Evince, the most common Linux PDF reader, simply lets you right-click on an image and save it.
  • PDF Candy can be used to extract images online.
  • Get pieces via PrintScreen and stitch them together in Microsoft Paint, GIMP, or a similar third-party program.
  • GIMP can also open pages from a PDF as an image at the resolution you specify. This is not quite the same as extracting the images. It provides no guidance on the ideal resolution for a given image, and it essentially renders the whole page before converting everything to an image. In short, it equivalent to the screenshot approach, but less work.
  • Inkscape, simply deselect “Embed (all) images” on the opening dialog. All images are now automatically extracted (as PNG images) in the folder. You can also simply right click a single image and choose “Extract Image…”.

Resolution issues[edit]

Some PDF readers can tell you the resolution; for documents created using typical “print quality“ settings, 300 ppi is probably the best guess. (Caveat: where the originals are between 300 & 450 ppi they’re often not downsampled to the 300 target, and moreover black-and-white “linework” images, one bit deep, are often kept at 1200 ppi or more.)

Getting around password protection[edit]

If the PDF is password-protected to prevent modification or extraction of content, you may be able to get around that by extracting the page with Inkscape, saving it as an unprotected file, then opening in Adobe Acrobat and passing the image to Photoshop or opening it in Nitro PDF and passing it to GIMP.