Python之利用PyPDF2库实现对PDF的删除和合并

    技术2022-07-14  71

    文章目录

    概述安装一、The PdfFileReader Class1、getNumPages()2、getPage(pageNumber) 二、The PdfFileWriter Class1、addPage(page)2、write(stream) 三、The PdfFileMerger Class方法1、append()方法2、merge()方法3、write() 实例一:删除实例二:合并

    概述

    PyPDF2是Python中用于对PDF操作的第三方库,提供了删除、合并、裁剪、转换等操作 最主要有四个类: The PdfFileReader Class The PdfFileMerger Class The PageObject Class The PdfFileWriter Class

    安装

    打开命令行键入

    pip install PyPDF2

    一、The PdfFileReader Class

    PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)

    Parameters: stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

    strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

    warndest – Destination for logging warnings (defaults to sys.stderr).

    overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

    1、getNumPages()

    Calculates the number of pages in this PDF file.

    Returns: number of pages Return type: int Raises PdfReadError: if file is encrypted and restrictions prevent this action.

    2、getPage(pageNumber)

    Retrieves a page by number from this PDF file.

    Parameters: pageNumber (int) – The page number to retrieve (pages begin at zero) Returns: a PageObject instance. Return type: PageObject

    二、The PdfFileWriter Class

    class PyPDF2.PdfFileWriter This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader).

    1、addPage(page)

    Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

    Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject

    2、write(stream)

    Writes the collection of pages added to this object out as a PDF file.

    Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

    三、The PdfFileMerger Class

    Initializes a PdfFileMerger object. PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, slice, insert, or any combination of the above. 初始化一个PdfFileMerger对象,PdfFileMerger 用来将多个PDF合并为一个PDF,它能够连接,切割,插入或者以上的任意组合

    See the functions merge() (or append()) and write() for usage information.

    Parameters: strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

    方法1、append()

    append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

    Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position. 和merge()方法相同,但假定的是你想要把全部页面连接到文件的最后而不是指定位置

    Parameters: fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file. 一个文件对象(python中用open()创建的对象)或者类似文件对象的能够支持标准读取和寻找方法的对象,也可以是一个代表指向PDF文件路径的字符串 bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark. pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document. 可以是一个页码序列或者一个(start, stop[, step])元组,用来合并指定范围的源文件页面到输出文件

    import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

    在这里插入代码片

    方法2、merge()

    merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)

    Merges the pages from the given file into the output file at the specified page number. 从指定位置合并来自给定文件的页面到输出文件

    Parameters: position (int) – The page number to insert this file. File will be inserted after the given number. 插入文件的页码数,将插入到给定页数的后面 0口1 口2 口3 fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file. bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark. pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.

    import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

    注意:position和pages均指的是下图绿色数字,pages的范围是绿色数字之间囊括的页面

    方法3、write()

    write(fileobj) Writes all data that has been merged to the given output file. 将所有被合并的数据写入到给定的输出文件中

    Parameters: fileobj – Output file. Can be a filename or any kind of file-like object. 输出文件,可以是一个文件名或者所有类似文件对象的对象

    实例一:删除

    #PDF_delete.py from PyPDF2 import PdfFileWriter, PdfFileReader def PDF_delete(index): output = PdfFileWriter() # 声明一个用于输出PDF的实例 input1 = PdfFileReader(open("C:/Users/Yuanzheng/Desktop/Test1.pdf", "rb")) # 读取本地PDF文件 pages = input1.getNumPages() # 读取文档的页数 for i in range(pages): if i + 1 in index: continue # 待删除的页面 output.addPage(input1.getPage(i)) # 读取PDF的第i页,添加到输出Output实例中 outputStream = open("C:/Users/Yuanzheng/Desktop/Test-Output1.pdf", "wb") output.write(outputStream) # 把编辑后的文档保存到本地 PDF_delete([2])

    实例二:合并

    #PDF_merger.py from PyPDF2 import PdfFileMerger merger = PdfFileMerger() input1 =open("C:/Users/Yuanzheng/Desktop/Test1.pdf","rb") input2 = open("C:/Users/Yuanzheng/Desktop/Test2.pdf","rb") merger.append(fileobj= input1) merger.merge(position=0,fileobj=input2,pages=(1,3)) output = open("C:/Users/Yuanzheng/Desktop/PyPDF-Output2.pdf","wb") merger.write(output)

    Reference:https://pythonhosted.org/PyPDF2/PageObject.html

    Processed: 0.028, SQL: 9