使用Python3合并PDF文件 [英] Merging PDF files with Python3

查看:438
本文介绍了使用Python3合并PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个小的脚本,需要合并许多一页的pdf文件.我希望脚本可以与Python3一起运行,并具有尽可能少的依赖关系.

I am writing a small script that needs to merge many one-page pdf files. I want the script to run with Python3 and to have as few dependencies as possible.

对于PDF合并部分,我尝试使用 PyPdf .但是,Python 3支持似乎有问题.它无法处理inkscape生成的PDF文件(我需要).我已经安装了PyPdf的当前git版本,并且以下测试脚本不起作用:

For the PDF merging part, I tried using PyPdf. However, the Python 3 support seems to be buggy; It can't handle inkscape generated PDF files (which I need). I have the current git version of PyPdf installed, and the following test script doesn't work:

import PyPDF2

output_pdf = PyPDF2.PdfFileWriter()

with open("testI.pdf", "rb") as input:
    input_pdf = PyPDF2.PdfFileReader(input)
    output_pdf.addPage(input_pdf.getPage(0))

with open("test.pdf", "wb") as output:
    output_pdf.write(output)

它将引发以下堆栈跟踪:

It throws the following stack trace:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    output.addPage(input.getPage(0))
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
    self._flatten()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
    self._flatten(page.getObject(), inherit)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
    return self.pdf.getObject(self).getObject()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
    retval = readObject(self.stream, self)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
    value = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
    obj = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
    return NumberObject.readFromStream(stream)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
    return FloatObject(name.decode("ascii"))
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
    return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context

但是,相同的脚本在Python 2.7中可以完美地工作.

The same script, however, works flawlessly with Python 2.7.

我在这里做错了什么?这是库中的错误吗?我可以解决该问题而无需接触PyPDF库吗?

What am I doing wrong here? Is it a bug in the library? Can I work around it without touching the PyPDF library?

推荐答案

所以我找到了答案. Python3.3中的decimal.Decimal模块显示了一些奇怪的行为.这是相应的StackOverflow问题:实例化十进制类我在PyPDF2库中添加了一些解决方法并提交了一个请求要求.

So I found the answer. The decimal.Decimal module in Python3.3 shows some weird behaviour. This is the corresponding StackOverflow question: Instantiate Decimal class I added some workaround to the PyPDF2 library and submitted a pull request.

这篇关于使用Python3合并PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆