Python pypdf2

From wikinotes

PyPDF2 is a tool to author, modify, and convert pdf files.


Basic Usage

from PyPDF2 import PdfFileReader, PdfFileWriter


inff  = open( 'input.pdf', 'rb' )
inpdf  = PdfFilereader( inff )

outff = open( 'output.pdf', 'wb' )
outpdf = PdfFileWriter()


num_pages = inpdf.getNumPages()
for num_page in num_pages:
	inpage = inpdf.getPage( num_page )

	##
	## Modify the inpdf page here
	##

	outpdf.addPage( inpage )

outpdf.write( outff )
inff.close()
outff.close()



getting PDF Information

Here are the basics of pdfs:

  • every point/pt == 1/72 inch
  • bounding boxes are measured from the lower-left to the upper-right
  • the pdf fileformat is made up of several boxes. mediabox is the largest.

For more information see: pdf.

page = inpdf.mediaBox.getPage( 1 )					## You operate on PDFs one page at a time
print( page['/Rotate'] )								## Pages are just big dicts of information


(page_w, page_h) = page.mediaBox.upperRight		## You can access/modify information as attributes:
page['/CropBox']											## You can also access/modify information from the dict



Rotating

I do not entirely understand the order-of-operations for rotations. It appears that changing the rotation value is only applied after the final PDF is authored. You can get around this using:

Note that if you are performing any other operations on that page, that the width/height are not swapped for rotations until after the PDF is authored.

inpage.mergeRotatePage(
		inpage,
		90,
	)