Poppler is a PDF rendering library with several useful tools for manipulating and converting PDFs.

Table of Contents

Installation

Its package name is poppler but it may be already installed on your system. I am going to show the basic usage of every tool, but you can get more info about one utility with the -h parameter.

pdfinfo

As you might expect, this utility gives information about a PDF file.

pdfinfo <PDF file>

pdftotext

Transforms a PDF into a plain text file. If no output filename is given, default is <PDF filename>.txt.

pdftotext <PDF file> [<output filename>]

pdfseparate

Extract the pages of a multi-page PDF.

pdfseparate <PDF file> <output PDF filenames pattern>
# pdfseparate test2.pdf test2_%d.pdf
  • -f: set the first page to extract.
  • -l: set the last page to extract.

pdfunite

Join several PDF files into one.

pdfunite <PDF file> <PDF file> ... <output PDF>
# pdfunite test1.pdf test2.pdf join.pdf

pdffonts

Get info about the embedded fonts.

pdffonts <PDF file>

pdfimages

Extract images from a PDF. You can use it with multipage files.

pdfimages [<options>] <PDF file> <image file root>
# pdfimages test2.pdf test2image
  • Add -png to convert into PNG files (default is PPM).

pdftoppm

Convert PDFs into PPM or other formats, like JPEG.

pdftoppm [<options>] <PDF file> <image file root>
  • -jpeg: convert PDF into a JPEG image.
  • -f <page>: first page to convert.
  • -l <page>: last page to convert.
  • -r <resolution>: set image resolution, in DPI.
    # Convert pages from 2 to 4 to a JPEG image
    pdftoppm -jpeg -f 2 -l 4 test.pdf test-image
    

pdftohtml

Convert PDF files into HTML. It can read from stdin if <PDF file> is -.

pdftohtml [<options>] <PDF file> [<HTML file>]
  • -s: generate a single HTML that includes all pages.
  • -i: ignore images.
Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.