Reduce PDF Size

There are multiple ways to reduce the size of a given PDF file. The easiest one is to remove content (e.g. images) or pages.

Removing duplication

Some PDF documents contain the same object multiple times. For example, if an image appears three times in a PDF it could be embedded three times. Or it can be embedded once and referenced twice.

This can be done by reading and writing the file:

from PyPDF2 import PdfReader, PdfWriter

writer = PdfWriter()

with open("smaller-new-file.pdf", "wb") as fp:
writer.write(fp)


It depends on the PDF how well this works, but we have seen an 86% file reduction (from 5.7 MB to 0.8 MB) within a real PDF.

Remove images

from PyPDF2 import PdfReader, PdfWriter

writer = PdfWriter()

writer.remove_images()

with open("out.pdf", "wb") as f:
writer.write(f)


Lossless Compression

PyPDF2 supports the FlateDecode filter which uses the zlib/deflate compression method. It is a lossless compression, meaning the resulting PDF looks exactly the same.

Deflate compression can be applied to a page via page.compress_content_streams:

from PyPDF2 import PdfReader, PdfWriter