使用PyPDF2在PDF上去除水印 [英] Watermark Removal on PDF with PyPDF2

查看:1057
本文介绍了使用PyPDF2在PDF上去除水印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本节从PyPDF2库中导入必要的类

This Section imports the necessary classes from the PyPDF2 library

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.pdf import ContentStream
from PyPDF2.generic import TextStringObject, NameObject
from PyPDF2.utils import b_

>The watermark says SAMPLE on it so I've tried different capitalization cases 
wm_text = 'Sample'
replace_with = ''
>I'm hoping to just replace the SAMPLE watermark with nothing so a space could suffice

> Load PDF into pyPDF
source = PdfFileReader(open('input.pdf', "rb"))
output = PdfFileWriter()

> For each page
for page in range(source.getNumPages()):
    # Get the current page and it's contents
    page = source.getPage(page)
    content_object = page["/Contents"].getObject()
    content = ContentStream(content_object, source)

> Loop over all pdf elements
    for operands, operator in content.operations:

被告知要根据我的PDF文件修改这部分

Was told to adapt this part dependent on my PDF file

        if operator == b_("TJ"):
            text = operands[0][0]
            if isinstance(text, TextStringObject) and text.startswith(wm_text):
                operands[0] = TextStringObject(replace_with)

将修改后的内容设置为页面上的内容对象

Set the modified content as content object on the page

    page.__setitem__(NameObject('/Contents'), content)

将页面添加到输出中

Add the page to the output

    output.addPage(page)

写流 outputStream = open("output.pdf","wb") output.write(outputStream)

Write the stream outputStream = open("output.pdf", "wb") output.write(outputStream)

推荐答案

在此处使用问题中的代码是可在Python 3中使用的函数.

Using the code from the question here is a function that works in Python 3.

def removeWatermark(wm_text, inputFile, outputFile):
    from PyPDF4 import PdfFileReader, PdfFileWriter
    from PyPDF4.pdf import ContentStream
    from PyPDF4.generic import TextStringObject, NameObject
    from PyPDF4.utils import b_

    with open(inputFile, "rb") as f:
        source = PdfFileReader(f, "rb")
        output = PdfFileWriter()

        for page in range(source.getNumPages()):
            page = source.getPage(page)
            content_object = page["/Contents"].getObject()
            content = ContentStream(content_object, source)

            for operands, operator in content.operations:
                if operator == b_("Tj"):
                    text = operands[0]

                    if isinstance(text, str) and text.startswith(wm_text):
                        operands[0] = TextStringObject('')

            page.__setitem__(NameObject('/Contents'), content)
            output.addPage(page)

        with open(outputFile, "wb") as outputStream:
            output.write(outputStream)

wm_text = 'wm_text'
inputFile = r'input.pdf'
outputFile = r"output.pdf"
removeWatermark(wm_text, inputFile, outputFile)

这篇关于使用PyPDF2在PDF上去除水印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆