Poppler: command-line PDF tools
Poppler is a PDF rendering library with several useful tools for manipulating and converting PDFs.
Table of Contents
Installation
Its package name is poppler
but it may be already installed on your system. I am going to show the basic usage of every tool, but you can get more info about one utility with the -h
parameter.
pdfinfo
As you might expect, this utility gives information about a PDF file.
pdfinfo <PDF file>
pdftotext
Transforms a PDF into a plain text file. If no output filename is given, default is <PDF filename>.txt
.
pdftotext <PDF file> [<output filename>]
pdfseparate
Extract the pages of a multi-page PDF.
pdfseparate <PDF file> <output PDF filenames pattern>
# pdfseparate test2.pdf test2_%d.pdf
-f
: set the first page to extract.-l
: set the last page to extract.
pdfunite
Join several PDF files into one.
pdfunite <PDF file> <PDF file> ... <output PDF>
# pdfunite test1.pdf test2.pdf join.pdf
pdffonts
Get info about the embedded fonts.
pdffonts <PDF file>
pdfimages
Extract images from a PDF. You can use it with multipage files.
pdfimages [<options>] <PDF file> <image file root>
# pdfimages test2.pdf test2image
- Add
-png
to convert into PNG files (default is PPM).
pdftoppm
Convert PDFs into PPM or other formats, like JPEG.
pdftoppm [<options>] <PDF file> <image file root>
-jpeg
: convert PDF into a JPEG image.-f <page>
: first page to convert.-l <page>
: last page to convert.-r <resolution>
: set image resolution, in DPI.# Convert pages from 2 to 4 to a JPEG image pdftoppm -jpeg -f 2 -l 4 test.pdf test-image
pdftohtml
Convert PDF files into HTML. It can read from stdin if <PDF file>
is -
.
pdftohtml [<options>] <PDF file> [<HTML file>]
-s
: generate a single HTML that includes all pages.-i
: ignore images.
If you have any suggestion, feel free to contact me via social media or email.
Latest tutorials and articles:
Featured content: