Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面 [英] Python + PyPdf: Crop region of page and paste it in another page

查看:89
本文介绍了Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个包含各种复杂元素的 pdf 页面.目标是裁剪页面的一个区域(仅提取一个元素),然后将其粘贴到另一个 pdf 页面中.

Lets say you have a pdf page with various complex elements inside. The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.

这是我的代码的简化版本:

Here is a simplified version of my code:

import PyPDF2
import PyPdf

def extract_tree(in_file, out_file):
    with open(in_file, 'rb') as infp:
        # Read the document that contains the tree (in its first page)
        reader = pyPdf.PdfFileReader(infp)
        page = reader.getPage(0)

        # Crop the tree. Coordinates below are only referential
        page.cropBox.lowerLeft = [100,200]
        page.cropBox.upperRight = [250,300]

        # Create an empty document and add a single page containing only the cropped page
        writer = pyPdf.PdfFileWriter()
        writer.addPage(page)
        with open(out_file, 'wb') as outfp:
            writer.write(outfp)

def insert_tree_into_page(tree_document, text_document):
    # Load the first page of the document containing 'text text text text...'
    text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)

    # Load the previously cropped tree (cropped using 'extract_tree')
    tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)

    # Overlay the text-page and the tree-crop   
    text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')

    # Save the result into a new empty document
    output = PyPDF2.PdfFileWriter()
    output.addPage(text_page)
    outputStream = file('merged_document.pdf','wb')
    output.write(outputStream)



# First, crop the tree and save it into cropped_document.pdf
extract_tree('document1.pdf', 'cropped_document.pdf')

# Now merge document2.pdf with cropped_document.pdf
insert_tree_into_page('cropped_document.pdf', 'document2.pdf')

extract_tree"方法似乎有效.它生成一个仅包含裁剪区域(在示例中为树)的 pdf 文件.问题是当我尝试在新页面中粘贴树时,无论如何都粘贴了原始图像的星星和房屋

The method "extract_tree" seems to be working. It generates a pdf file containing only the cropped region (in the example, the tree). The problem in that when I try to paste the tree in the new page, the star and the house of the original image are pasted anyway

推荐答案

我遇到了完全相同的问题.最后,我的解决方案是对 pyPDF2 的源代码(来自 这个拉取请求,从未进入主分支).您需要做的是将这些行插入到文件 pdf.py 内的 PageObject 类的方法 _mergePage 中:

I had the exact same issue. In the end, the solution for me was to make a small edit to the source code of pyPDF2 (from this pull request, which never made it into the master branch). What you need to do is insert these lines into the method _mergePage of the class PageObject inside the file pdf.py:

page2Content = ContentStream(page2Content, self.pdf)
page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])
page2Content.operations.insert(1, [[], "W"])
page2Content.operations.insert(2, [[], "n"])

(请参阅拉取请求以了解将它们放在哪里).完成后,您可以裁剪您想要的 pdf 部分,并将其与另一个页面合并,没有问题.除非您愿意,否则无需将裁剪后的部分保存为单独的 pdf.

(see the pull request for exactly where to put them). With that done, you can then crop the section of a pdf you want, and merge it with another page with no issues. There's no need to save the cropped section into a separate pdf, unless you want to.

from PyPDF2 import PdfFileReader, PdfFileWriter

tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)
text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)

tree_page.cropBox.lowerLeft = [100,200]
tree_page.cropBox.upperRight = [250, 300]

text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')
output = PdfFileWriter()
output.addPage(text_page)
output.write(open('merged_document.pdf', 'wb'))

也许有更好的方法可以插入该代码,而无需直接编辑源代码.如果有人找到一种方法,我将不胜感激,因为这无疑是一个有点狡猾的黑客.

Maybe there's a better way of doing this that inserts that code without directly editing the source code. I'd be grateful if anyone finds a way to do it as this admittedly is a slightly dodgy hack.

这篇关于Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆