如何编辑pdf文件，替换其数据? [英] How to edit a pdf file, replacing its data?

查看：76 发布时间：2021/5/3 20:05:35 python pdf edit pypdf

本文介绍了如何编辑pdf文件，替换其数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试旋转pdf文件中的页面，然后用SAME pdf文件中的旋转页面替换旧页面.

I am trying to rotate pages in a pdf file, and then replace the old pages with the rotated ones in the SAME pdf file.

我写了以下代码:

#!/usr/bin/python

import os
from pyPdf import PdfFileReader, PdfFileWriter

my_path = "/home/USER/Desktop/files/"

input_file_name = os.path.join(my_path, "myfile.pdf")
input_file = PdfFileReader(file(input_file_name, "rb"))
input_file.decrypt("MyPassword")
output_PDF = PdfFileWriter()

for num_page in range(0, input_file.getNumPages()):
    page = input_file.getPage(num_page)
    page.rotateClockwise(270)
    output_PDF.addPage(page)

#Trying to replace old data with new data in the original file, not
#create a new file and add the new data!
output_file_name = os.path.join(my_path, "myfile.pdf")
output_file = file(output_file_name, "wb")
output_PDF.write(output_file)
output_file.close()

上面的代码给我一个错误！我什至尝试使用:

The above code gives me an error! I 've even tried using:

input_file = PdfFileReader(file(input_file_name, "r+b"))

但它也不起作用...

but it didn't work either...

更改线路:

output_file_name = os.path.join(my_path, "myfile.pdf")

具有:

output_file_name = os.path.join(my_path, "myfile2.pdf")

修复所有问题，但这不是我想要的...

fixes everything, but it's not what I want...

有帮助吗?

错误代码:

回溯(最近一次通话最后一次):文件"12-5.py"，第22行，在output_PDF.write(output_file)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第264行，正在写入self._sweepIndirectReferences(externalReferenceMap，self._root)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第339行，在_sweepIndirectReferencesself._sweepIndirectReferences(externMap，realdata)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第315行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，值)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第339行，在_sweepIndirectReferencesself._sweepIndirectReferences(externMap，realdata)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第315行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，值)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第324行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，data [i])文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第339行，在_sweepIndirectReferencesself._sweepIndirectReferences(externMap，realdata)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第315行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，值)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第324行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，data [i])文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第345行，在_sweepIndirectReferencesnewobj = data.pdf.getObject(data)文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第649行，位于getObject中retval = readObject(self.stream，self)文件"/usr/lib/pymodules/python2.7/pyPdf/generic.py"，第67行，在readObject返回DictionaryObject.readFromStream(stream，pdf)文件"/usr/lib/pymodules/python2.7/pyPdf/generic.py"，第564行，在readFromStream引发utils.PdfReadError，在流之后找不到'endstream'标记."pyPdf.utils.PdfReadError:无法找到"endstream"标记流之后.

Traceback (most recent call last): File "12-5.py", line 22, in output_PDF.write(output_file) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 264, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 345, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 649, in getObject retval = readObject(self.stream, self) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 564, in readFromStream raise utils.PdfReadError, "Unable to find 'endstream' marker after stream." pyPdf.utils.PdfReadError: Unable to find 'endstream' marker after stream.

推荐答案

我怀疑问题是PyPDF正在写入文件时正在读取文件.

The issue, I suspect, is that PyPDF is reading from the file as it's being written to.

您已经注意到，正确的解决方法是写入一个单独的文件，然后用新文件替换原始文件.像这样:

The correct fix — as you've noticed — is to write to a separate file, then replace the original file with the new file. Something like this:

output_file_name = os.path.join(my_path, "myfile-temporary.pdf")
output_file = file(output_file_name, "wb")
output_PDF.write(output_file)
output_file.close()
os.rename(output_file_name, input_file_name)

我编写了一些代码来简化此操作: https://github.com/shazow/unstdlib.py/blob/master/unstdlib/standard/contextlib_.py#L14

I've written a bit of code which simplifies this: https://github.com/shazow/unstdlib.py/blob/master/unstdlib/standard/contextlib_.py#L14

from unstdlib.standard.contextlib_ import open_atomic

with open_atomic(input_file_name, "wb") as output_file:
    output_PDF.write(output_file)

这将自动创建一个临时文件，对其进行写入，然后替换原始文件.

This will automatically create a temporary file, write to it, then replace the original file.

编辑:最初我误解了问题.以下是我的错误信息，但可能会对其他人的回答有所帮助.

edit: I had initially mis-read the question. Below is my incorrect but potentially helpful to other people answer.

您的代码很好，并且可以在大多数" PDF上正常工作.

Your code is fine, and should work without issue on "most" PDFs.

您看到的问题是PyPDF与您要使用的特定PDF不兼容.这可能是PyPDF中的错误，也可能是PDF并非完全有效.

The issue you're seeing is an incompatibility between PyPDF and the specific PDF you're trying to use. This may be a bug in PyPDF or it may be that the PDF isn't totally valid.

您可以尝试两种方法:

查看PyPDF2是否可以读取文件.使用 pip install PyPDF2 安装PyPDF2，将 import pyPdf…替换为 import PyPDF2…，然后重新运行脚本.

See if PyPDF2 can read the file. Install PyPDF2 with pip install PyPDF2, replace import pyPdf … with import PyPDF2 …, then re-run your script.

使用另一个程序重新编码您的PDF，然后查看是否有效.例如，使用 convert bad.pdf bad.ps;转换bad.ps也许很好.pdf 可能修复问题.

Use another program to re-encode your PDF and see if that works. For example, using something like convert bad.pdf bad.ps; convert bad.ps maybe-good.pdf might fix things.

这篇关于如何编辑pdf文件，替换其数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何编辑pdf文件，替换其数据? [英] How to edit a pdf file, replacing its data?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何编辑pdf文件，替换其数据? [英] How to edit a pdf file, replacing its data?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭