The PdfReader Class

class PyPDF2.PdfReader(stream: Union[str, _io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO, pathlib.Path], strict: bool = False, password: Union[None, str, bytes] = None)[source]

Bases: object

Initialize a PdfReader object.

This operation can take some time, as the PDF stream’s cross-reference tables are read into memory.

Parameters
  • stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

  • strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to False.

  • password (None/str/bytes) – Decrypt PDF file at initialization. If the password is None, the file will not be decrypted. Defaults to None

cacheGetIndirectObject(generation: int, idnum: int) Optional[PyPDF2.generic.PdfObject][source]

Deprecated since version 1.28.0: Use cache_get_indirect_object() instead.

cacheIndirectObject(generation: int, idnum: int, obj: Optional[PyPDF2.generic.PdfObject]) Optional[PyPDF2.generic.PdfObject][source]

Deprecated since version 1.28.0: Use cache_indirect_object() instead.

cache_get_indirect_object(generation: int, idnum: int) Optional[PyPDF2.generic.PdfObject][source]
cache_indirect_object(generation: int, idnum: int, obj: Optional[PyPDF2.generic.PdfObject]) Optional[PyPDF2.generic.PdfObject][source]
decode_permissions(permissions_code: int) Dict[str, bool][source]
decrypt(password: Union[str, bytes]) int[source]

When using an encrypted / secured PDF file with the PDF Standard encryption handler, this function will allow the file to be decrypted. It checks the given password against the document’s user password and owner password, and then stores the resulting decryption key if either password is correct.

It does not matter which password was matched. Both passwords provide the correct decryption key that will allow the document to be used with this library.

Parameters

password (str) – The password to match.

Returns

0 if the password failed, 1 if the password matched the user password, and 2 if the password matched the owner password.

Return type

int

Raises

NotImplementedError – if document uses an unsupported encryption method.

property documentInfo: Optional[PyPDF2._reader.DocumentInformation]

Deprecated since version 1.28.0.

Use the attribute metadata instead.

getDestinationPageNumber(destination: PyPDF2.generic.Destination) int[source]

Deprecated since version 1.28.0: Use get_destination_page_number() instead.

getDocumentInfo() Optional[PyPDF2._reader.DocumentInformation][source]

Deprecated since version 1.28.0: Use the attribute metadata instead.

getFields(tree: Optional[PyPDF2.generic.TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]][source]

Deprecated since version 1.28.0: Use get_fields() instead.

getFormTextFields() Dict[str, Any][source]

Deprecated since version 1.28.0: Use get_form_text_fields() instead.

getIsEncrypted() bool[source]

Deprecated since version 1.28.0: Use is_encrypted instead.

getNamedDestinations(tree: Optional[PyPDF2.generic.TreeObject] = None, retval: Optional[Any] = None) Dict[str, Any][source]

Deprecated since version 1.28.0: Use named_destinations instead.

getNumPages() int[source]

Deprecated since version 1.28.0: Use len(reader.pages) instead.

getObject(indirectReference: PyPDF2.generic.IndirectObject) Optional[PyPDF2.generic.PdfObject][source]

Deprecated since version 1.28.0: Use get_object() instead.

getOutlines(node: Optional[PyPDF2.generic.DictionaryObject] = None, outlines: Optional[Any] = None) List[Union[PyPDF2.generic.Destination, List[Union[PyPDF2.generic.Destination, List[PyPDF2.generic.Destination]]]]][source]

Deprecated since version 1.28.0: Use outlines instead.

getPage(pageNumber: int) PyPDF2._page.PageObject[source]

Deprecated since version 1.28.0: Use reader.pages[pageNumber] instead.

getPageLayout() Optional[str][source]

Deprecated since version 1.28.0: Use page_layout instead.

getPageMode() Optional[typing_extensions.Literal[/UseNone, /UseOutlines, /UseThumbs, /FullScreen, /UseOC, /UseAttachments]][source]

Deprecated since version 1.28.0: Use page_mode instead.

getPageNumber(page: PyPDF2._page.PageObject) int[source]

Deprecated since version 1.28.0: Use get_page_number() instead.

getXmpMetadata() Optional[PyPDF2.xmp.XmpInformation][source]

Deprecated since version 1.28.0: Use the attribute xmp_metadata instead.

get_destination_page_number(destination: PyPDF2.generic.Destination) int[source]

Retrieve page number of a given Destination object.

Parameters

destination (Destination) – The destination to get page number.

Returns

the page number or -1 if page not found

Return type

int

get_fields(tree: Optional[PyPDF2.generic.TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]][source]

Extracts field data if this PDF contains interactive form fields. The tree and retval parameters are for recursive use.

Parameters

fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.

Returns

A dictionary where each key is a field name, and each value is a Field object. By default, the mapping name is used for keys.

Return type

dict, or None if form data could not be located.

get_form_text_fields() Dict[str, Any][source]

Retrieves form fields from the document with textual data (inputs, dropdowns)

get_object(indirect_reference: PyPDF2.generic.IndirectObject) Optional[PyPDF2.generic.PdfObject][source]
get_page_number(page: PyPDF2._page.PageObject) int[source]

Retrieve page number of a given PageObject

Parameters

page (PageObject) – The page to get page number. Should be an instance of PageObject

Returns

the page number or -1 if page not found

Return type

int

property isEncrypted: bool

Deprecated since version 1.28.0.

Use is_encrypted instead.

property is_encrypted: bool

Read-only boolean property showing whether this PDF file is encrypted. Note that this property, if true, will remain true even after the decrypt() method is called.

property metadata: Optional[PyPDF2._reader.DocumentInformation]

Retrieve the PDF file’s document information dictionary, if it exists. Note that some PDF files use metadata streams instead of docinfo dictionaries, and these metadata streams will not be accessed by this function.

Returns

the document information of this PDF file

Return type

DocumentInformation or None if none exists.

property namedDestinations: Dict[str, Any]

Deprecated since version 1.28.0.

Use named_destinations instead.

property named_destinations: Dict[str, Any]

A read-only dictionary which maps names to Destinations

property numPages: int

Deprecated since version 1.28.0.

Use len(reader.pages) instead.

property outlines: List[Union[PyPDF2.generic.Destination, List[Union[PyPDF2.generic.Destination, List[PyPDF2.generic.Destination]]]]]

Read-only property for outlines present in the document.

Returns

a nested list of Destinations.

property pageLayout: Optional[str]

Deprecated since version 1.28.0.

Use page_layout instead.

property pageMode: Optional[typing_extensions.Literal[/UseNone, /UseOutlines, /UseThumbs, /FullScreen, /UseOC, /UseAttachments]]

Deprecated since version 1.28.0.

Use page_mode instead.

property page_layout: Optional[str]

Get the page layout.

Returns

Page layout currently being used.

Return type

str, None if not specified

Valid layout values

/NoLayout

Layout explicitly not specified

/SinglePage

Show one page at a time

/OneColumn

Show one column at a time

/TwoColumnLeft

Show pages in two columns, odd-numbered pages on the left

/TwoColumnRight

Show pages in two columns, odd-numbered pages on the right

/TwoPageLeft

Show two pages at a time, odd-numbered pages on the left

/TwoPageRight

Show two pages at a time, odd-numbered pages on the right

property page_mode: Optional[typing_extensions.Literal[/UseNone, /UseOutlines, /UseThumbs, /FullScreen, /UseOC, /UseAttachments]]

Get the page mode.

Returns

Page mode currently being used.

Return type

str, None if not specified

Valid mode values

/UseNone

Do not show outlines or thumbnails panels

/UseOutlines

Show outlines (aka bookmarks) panel

/UseThumbs

Show page thumbnails panel

/FullScreen

Fullscreen view

/UseOC

Show Optional Content Group (OCG) panel

/UseAttachments

Show attachments panel

property pages: PyPDF2._page._VirtualList

Read-only property that emulates a list of Page objects.

read(stream: Union[_io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO]) None[source]
readNextEndLine(stream: Union[_io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO], limit_offset: int = 0) bytes[source]

Deprecated since version 1.28.0.

readObjectHeader(stream: Union[_io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO]) Tuple[int, int][source]

Deprecated since version 1.28.0: Use read_object_header() instead.

read_next_end_line(stream: Union[_io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO], limit_offset: int = 0) bytes[source]

Deprecated since version 2.1.0.

read_object_header(stream: Union[_io.BytesIO, _io.BufferedReader, _io.BufferedWriter, _io.FileIO]) Tuple[int, int][source]
property xmpMetadata: Optional[PyPDF2.xmp.XmpInformation]

Deprecated since version 1.28.0.

Use the attribute xmp_metadata instead.

property xmp_metadata: Optional[PyPDF2.xmp.XmpInformation]

XMP (Extensible Metadata Platform) data

Returns

a XmpInformation instance that can be used to access XMP metadata from the document.

Return type

XmpInformation or None if no metadata was found on the document root.