Processing PDF files with a Terminal

Learn how to use some commands to create and process PDF files using a Terminal.

GhostScript
- Some examples
LibreOffice
GraphicsMagick
Poppler
pdftk
OCRmyPDF

GhostScript

With GhostScript you can make several processing tasks to a PDF file, like compressing.

Following command may seem a bit complex, but this is useful to understand how to use every parameter.

gs \
-dNOPAUSE \
-dQUIET \
-dBATCH \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/printer \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=150 \
-dGrayImageResolution=150 \
-dMonoImageResolution=150 \
-dPrinted=false \
-sOutputFile=output.pdf \
input.pdf

-dNOPAUSE -dQUIET -dBATCH: By default, gs will show every page of the PDF file and process it, one by one, with a manual confirmation between pages. -dNOPAUSE eliminates the manual confirmation and -dBATCH automatically close gs after the process. -dQUIET hides visual output of the process (equivalent to -q).
-sDEVICE=pdfwrite: this specifies output file format. There are several options: “pdfwrite”, “ps2write”, “png16m”, “jpeg”, etc.
-dPDFSETTINGS=/printer: these are predefined templates for processing a PDF. Allowed values are (from worse quality to better): “/screen”, “/ebook”, “/printer” and “/prepress” (More info). There is also a “/default” template. You can overwrite individual settings, and this is what I do with the following parameters.
-dAutoFilterColorImages=false -dAutoFilterGrayImages=false: I’m not sure what kind of filtering does this parameter, but setting it to “false” makes output to weigh less.
-dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true: this allows to reduce image resolution below the current level.
-dColorImageResolution=150 -dGrayImageResolution=150 -dMonoImageResolution=150: this sets image resolution in DPI (dots per inch).
-dPrinted=false: if this parameter is equal to “true”, means output will be printed and therefore is not necessary to keep hyperlinks.
-sOutputFile=output.pdf: in this parameter you type filename of the output.

Some examples

# Create a preview (a PDF with the first pages)
gs -dNOPAUSE -dBATCH -dQUIET -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=5 -sOutputFile=preview.pdf input.pdf

# Export a single-page PDF as a JPEG image (quality: 80%)
gs -dNOPAUSE -dBATCH -q -sDEVICE=jpeg -dJPEGQ=80 -sOutputFile=image.jpg input.pdf

# Compress a PDF using the 'ebook' template
gs -dNOPAUSE -dBATCH -q -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -sOutputFile=output.pdf input.pdf

# Convert a color PDF to B&W
gs -dNOPAUSE -dBATCH -q -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -sOutputFile=input_bw.pdf input.pdf

Note (for scripts): If you insert a GhostScript command into a variable and you get errors when using filenames with spaces, try transforming the filename to add escape characters because scripts don’t handle escape characters correctly (check Bash syntax for more info about the syntax), and use sh -c to run GhostScript (with double quotes around). For example, a file named ‘pdf-compress.sh’:

#!/bin/bash
COMMAND="gs -dNOPAUSE -dBATCH -q -sDEVICE=pdfwrite "
# INPUT and OUTPUT are filenames
INPUT=${1// /\\ }
INPUT=${2// /\\ }
COMMAND=${COMMAND}"-sOutputFilename=$OUTPUT $INPUT"
sh -c "$COMMAND"

You can run the script as usual (escaping spaces with \):

pdf-compress.sh input\ with\ spaces.pdf output\ with\ spaces.pdf

LibreOffice

Using LibreOffice with the command line is not very common, but it’s useful if you want to use a script to convert a text file to PDF.

# This will use default settings for PDF export
soffice --convert-to pdf --outdir /output-folder input.docx

GraphicsMagick

This program allows you to convert one or several images to PDF. It’s as simple as this:

gm convert image1.png image2.png file.pdf

Poppler

See Poppler: command-line PDF tools.

pdftk

This tool for manipulating PDF files can do a lot of things. pdftk syntax is simple:

pdftk <input file> <operation> output <output file> [<other parameters>]

These are some of the available ‘operations’:

cat <page-range>: use it to merge, split or rotate pages.

# Remove first page
pdftk input.pdf cat 2-end output out.pdf

# Select odd pages within a range
pdftk input.pdf cat 3-27odd output out.pdf

# Two ranges
pdftk input.pdf cat 2-5 7-9 output out.pdf

# Rotate clockwise (90 degrees)
# Page rotation can be north: 0, east: 90, south: 180, west: 270
pdftk input.pdf cat 1-endeast output out.pdf

background and stamp: add a watermark.

After adding the output filename, you can add some additional parameters to modify the file:

flatten: flatten a PDF form.
user_pw PROMPT: encrypt the file with a password.

See my other pdftk posts for Encrypting PDFs, How to flatten PDF forms to avoid compatibility errors and How to add a watermark to your multimedia files.

OCRmyPDF

Tool to add an OCR layer into a PDF. Check my post: Add an OCR layer to a PDF with Tesseract and ocrmypdf.

Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.

Processing PDF files with a Terminal

Table of Contents

GhostScript

Some examples

LibreOffice

GraphicsMagick

Poppler

pdftk

OCRmyPDF

Creating static websites with Astro

Speech Note: Text-To-Speech, Speech-To-Text and Translations within the same application

Fixing WebGL issues in LibreWolf

How to run CLI scripts inside a GUI environment

Open source projects to follow (XI)

Convert between several markup formats with Pandoc

timeout: run a command with a time limit

Export a manpage to (almost) any format

RSS readers: read feeds with these graphical and command line tools

RS1 Linux News: news aggregator focused on Linux and open source

Open source projects to follow (X)

Joplin: an awesome note-taking application, available on multiple devices

Mabox Linux: a lightweight Manjaro with Openbox WM

scan4all: a new vulnerability scanner

Using Kali Linux on Linode (VNC)

Alternative search engines: life beyond Google

Ultramarine Linux: Fedora with some useful tweaks

How to run Linux commands on a Google Colab notebook

Limit available system resources per user with Systemd and cgroups

Bliss OS: Android on your PC

Google Colab: some great projects

Quickemu: an alternative to GNOME Boxes for using virtual machines

Running desktop apps on Docker containers: X11 forwarding

List of Linux and FOSS websites