Python pypdf2
From wikinotes
PyPDF2 is a tool to author, modify, and convert pdf files.
Basic Usage
from PyPDF2 import PdfFileReader, PdfFileWriter inff = open( 'input.pdf', 'rb' ) inpdf = PdfFilereader( inff ) outff = open( 'output.pdf', 'wb' ) outpdf = PdfFileWriter() num_pages = inpdf.getNumPages() for num_page in num_pages: inpage = inpdf.getPage( num_page ) ## ## Modify the inpdf page here ## outpdf.addPage( inpage ) outpdf.write( outff ) inff.close() outff.close()
getting PDF Information
Here are the basics of pdfs:
- every point/pt == 1/72 inch
- bounding boxes are measured from the lower-left to the upper-right
- the pdf fileformat is made up of several boxes. mediabox is the largest.
For more information see: pdf.
page = inpdf.mediaBox.getPage( 1 ) ## You operate on PDFs one page at a time print( page['/Rotate'] ) ## Pages are just big dicts of information (page_w, page_h) = page.mediaBox.upperRight ## You can access/modify information as attributes: page['/CropBox'] ## You can also access/modify information from the dict
Rotating
I do not entirely understand the order-of-operations for rotations. It appears that changing the rotation value is only applied after the final PDF is authored. You can get around this using:
Note that if you are performing any other operations on that page, that the width/height are not swapped for rotations until after the PDF is authored.
inpage.mergeRotatePage( inpage, 90, )