Migration Guide: 1.x to 2.x

PyPDF2<2.0.0 (docs) is very different from PyPDF2>=2.0.0 (docs).

Luckily, most changes are simple naming adjustments. This guide helps you to make the step from PyPDF2 1.x (or even the original PyPpdf) to PyPDF2>=2.0.0.

You can execute your code with the updated version and show deprecation warnings by running python -W all your_code.py.

Imports and Modules

  • PyPDF2.utils no longer exists

  • PyPDF2.pdf no longer exists. You can import from PyPDF2 directly or from PyPDF2.generic

Naming Adjustments

Classes

The base classes were renamed as they also allow to operate with ByteIO streams instead of files. Also, the strict paramter changed the default value from strict=True to strict=False.

  • PdfFileReaderPdfReader

  • PdfFileWriterPdfWriter

  • PdfFileMergerPdfMerger

PdfFileReader and PdfFileMerger no longer have the overwriteWarnings parameter. The new behavior is overwriteWarnings=False.

Function, Method, and Property Names

In PyPDF2.xmp.XmpInformation:

  • rdfRootrdf_root

  • xmp_createDatexmp_create_date

  • xmp_creatorToolxmp_creator_tool

  • xmp_metadataDatexmp_metadata_date

  • xmp_modifyDatexmp_modify_date

  • xmpMetadataxmp_metadata

  • xmpmm_documentIdxmpmm_document_id

  • xmpmm_instanceIdxmpmm_instance_id

In PyPDF2.generic:

  • readObjectread_object

  • convertToIntconvert_to_int

  • DocumentInformation.getTextDocumentInformation._get_text : This method should typically not be used; please let me know if you need it.

  • readHexStringFromStreamread_hex_string_from_stream

  • initializeFromDictionaryinitialize_from_dictionary

  • createStringObjectcreate_string_object

  • TreeObject.hasChildrenTreeObject.has_children

  • TreeObject.emptyTreeTreeObject.empty_tree

In many places:

  • getObjectget_object

  • writeToStreamwrite_to_stream

  • readFromStreamread_from_stream

PdfReader class:

  • reader.getPage(pageNumber)reader.pages[page_number]

  • reader.getNumPages() / reader.numPageslen(reader.pages)

  • getDocumentInfometadata

  • flattenedPages attribute ➔ flattened_pages

  • resolvedObjects attribute ➔ resolved_objects

  • xrefIndex attribute ➔ xref_index

  • getNamedDestinations / namedDestinations attribute ➔ named_destinations

  • getPageLayout / pageLayoutpage_layout attribute

  • getPageMode / pageModepage_mode attribute

  • getIsEncrypted / isEncryptedis_encrypted attribute

  • getOutlinesget_outlines

  • readObjectHeaderread_object_header

  • cacheGetIndirectObjectcache_get_indirect_object

  • cacheIndirectObjectcache_indirect_object

  • getDestinationPageNumberget_destination_page_number

  • readNextEndLineread_next_end_line

  • _zeroXref_zero_xref

  • _authenticateUserPassword_authenticate_user_password

  • _pageId2Num attribute ➔ _page_id2num

  • _buildDestination_build_destination

  • _buildOutline_build_outline

  • _getPageNumberByIndirect(indirectRef)_get_page_number_by_indirect(indirect_ref)

  • _getObjectFromStream_get_object_from_stream

  • _decryptObject_decrypt_object

  • _flatten(..., indirectRef)_flatten(..., indirect_ref)

  • _buildField_build_field

  • _checkKids_check_kids

  • _writeField_write_field

  • _write_field(..., fieldAttributes)_write_field(..., field_attributes)

  • _read_xref_subsections(..., getEntry, ...)_read_xref_subsections(..., get_entry, ...)

PdfWriter class:

  • writer.getPage(pageNumber)writer.pages[page_number]

  • writer.getNumPages()len(writer.pages)

  • addMetadataadd_metadata

  • addPageadd_page

  • addBlankPageadd_blank_page

  • addAttachment(fname, fdata)add_attachment(filename, data)

  • insertPageinsert_page

  • insertBlankPageinsert_blank_page

  • appendPagesFromReaderappend_pages_from_reader

  • updatePageFormFieldValuesupdate_page_form_field_values

  • cloneReaderDocumentRootclone_reader_document_root

  • cloneDocumentFromReaderclone_document_from_reader

  • getReferenceget_reference

  • getOutlineRootget_outline_root

  • getNamedDestRootget_named_dest_root

  • addBookmarkDestinationadd_bookmark_destination

  • addBookmarkDictadd_bookmark_dict

  • addBookmarkadd_bookmark

  • addNamedDestinationObjectadd_named_destination_object

  • addNamedDestinationadd_named_destination

  • removeLinksremove_links

  • removeImages(ignoreByteStringObject)remove_images(ignore_byte_string_object)

  • removeText(ignoreByteStringObject)remove_text(ignore_byte_string_object)

  • addURIadd_uri

  • addLinkadd_link

  • getPage(pageNumber)get_page(page_number)

  • getPageLayout / setPageLayout / pageLayoutpage_layout attribute

  • getPageMode / setPageMode / pageModepage_mode attribute

  • _addObject_add_object

  • _addPage_add_page

  • _sweepIndirectReferences_sweep_indirect_references

PdfMerger class

  • __init__ parameter: strict=Truestrict=False (the PdfFileMerger still has the old default)

  • addMetadataadd_metadata

  • addNamedDestinationadd_named_destination

  • setPageLayoutset_page_layout

  • setPageModeset_page_mode

Page class:

  • artBox / bleedBox / cropBox / mediaBox / trimBoxartbox / bleedbox / cropbox / mediabox / trimbox

    • getWidth, getHeight width / height

    • getLowerLeft_x / getUpperLeft_xleft

    • getUpperRight_x / getLowerRight_xright

    • getLowerLeft_y / getLowerRight_ybottom

    • getUpperRight_y / getUpperLeft_ytop

    • getLowerLeft / setLowerLeftlower_left property

    • upperRightupper_right

  • mergePagemerge_page

  • rotateClockwise / rotateCounterClockwiserotate_clockwise

  • _mergeResources_merge_resources

  • _contentStreamRename_content_stream_rename

  • _pushPopGS_push_pop_gs

  • _addTransformationMatrix_add_transformation_matrix

  • _mergePage_merge_page

XmpInformation class:

  • getElement(..., aboutUri, ...)get_element(..., about_uri, ...)

  • getNodesInNamespace(..., aboutUri, ...)get_nodes_in_namespace(..., aboutUri, ...)

  • _getText_get_text

utils.py:

  • matrixMultiply ➔ `matrix_multiply

  • RC4_encrypt is moved to the security module

Parameter Names

  • PdfWriter.get_page: pageNumberpage_number

  • PyPDF2.filters (all classes): decodeParmsdecode_parms

  • PyPDF2.filters (all classes): decodeStreamDatadecode_stream_data

Deprecations

A few classes / functions were deprecated without replacement:

  • PyPDF2.utils.ConvertFunctionsToVirtualList

  • PyPDF2.utils.formatWarning

  • PyPDF2.isInt(obj): Use instance(obj, int) instead

  • PyPDF2.u_(s): Use s directly

  • PyPDF2.chr_(c): Use chr(c) instead

  • PyPDF2.barray(b): Use bytearray(b) instead

  • PyPDF2.isBytes(b): Use instance(b, type(bytes())) instead

  • PyPDF2.xrange_fn: Use range instead

  • PyPDF2.string_type: Use str instead

  • PyPDF2.isString(s): Use instance(s, str) instead

  • PyPDF2._basestring: Use str instead

  • b_(...) was removed. You should typically be able use the bytes object directly, otherwise you can copy this