Poppler: command-line PDF tools

Poppler is a PDF rendering library with several useful tools for manipulating and converting PDFs.

Installation
pdfinfo
pdftotext
pdfseparate
pdfunite
pdffonts
pdfimages
pdftoppm
pdftohtml

Installation

Its package name is poppler but it may be already installed on your system. I am going to show the basic usage of every tool, but you can get more info about one utility with the -h parameter.

pdfinfo

As you might expect, this utility gives information about a PDF file.

pdfinfo <PDF file>

pdftotext

Transforms a PDF into a plain text file. If no output filename is given, default is <PDF filename>.txt.

pdftotext <PDF file> [<output filename>]

pdfseparate

Extract the pages of a multi-page PDF.

pdfseparate <PDF file> <output PDF filenames pattern>
# pdfseparate test2.pdf test2_%d.pdf

-f: set the first page to extract.
-l: set the last page to extract.

pdfunite

Join several PDF files into one.

pdfunite <PDF file> <PDF file> ... <output PDF>
# pdfunite test1.pdf test2.pdf join.pdf

pdffonts

Get info about the embedded fonts.

pdffonts <PDF file>

pdfimages

Extract images from a PDF. You can use it with multipage files.

pdfimages [<options>] <PDF file> <image file root>
# pdfimages test2.pdf test2image

Add -png to convert into PNG files (default is PPM).

pdftoppm

Convert PDFs into PPM or other formats, like JPEG.

pdftoppm [<options>] <PDF file> <image file root>

-jpeg: convert PDF into a JPEG image.
-f <page>: first page to convert.
-l <page>: last page to convert.

-r <resolution>: set image resolution, in DPI.

# Convert pages from 2 to 4 to a JPEG image
pdftoppm -jpeg -f 2 -l 4 test.pdf test-image

pdftohtml

Convert PDF files into HTML. It can read from stdin if <PDF file> is -.

pdftohtml [<options>] <PDF file> [<HTML file>]

-s: generate a single HTML that includes all pages.
-i: ignore images.

Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.

Poppler: command-line PDF tools

Table of Contents

Installation

pdfinfo

pdftotext

pdfseparate

pdfunite

pdffonts

pdfimages

pdftoppm

pdftohtml

Creating static websites with Astro

Speech Note: Text-To-Speech, Speech-To-Text and Translations within the same application

Fixing WebGL issues in LibreWolf

How to run CLI scripts inside a GUI environment

Open source projects to follow (XI)

Convert between several markup formats with Pandoc

timeout: run a command with a time limit

Export a manpage to (almost) any format

RSS readers: read feeds with these graphical and command line tools

RS1 Linux News: news aggregator focused on Linux and open source

Open source projects to follow (X)

Joplin: an awesome note-taking application, available on multiple devices

Mabox Linux: a lightweight Manjaro with Openbox WM

scan4all: a new vulnerability scanner

Using Kali Linux on Linode (VNC)

Alternative search engines: life beyond Google

Ultramarine Linux: Fedora with some useful tweaks

How to run Linux commands on a Google Colab notebook

Limit available system resources per user with Systemd and cgroups

Bliss OS: Android on your PC

Google Colab: some great projects

Quickemu: an alternative to GNOME Boxes for using virtual machines

Running desktop apps on Docker containers: X11 forwarding

List of Linux and FOSS websites