移植到Python3:PyPDF2 mergePage()给出TypeError [英] Porting to Python3: PyPDF2 mergePage() gives TypeError

查看:752
本文介绍了移植到Python3:PyPDF2 mergePage()给出TypeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Windows 7上使用Python 3.4.2和PyPDF2 1.24(在有帮助的情况下也使用reportlab 3.1.44).

I'm using Python 3.4.2 and PyPDF2 1.24 (also using reportlab 3.1.44 in case that helps) on windows 7.

我最近从Python 2.7升级到3.4,并且正在移植我的代码.该代码用于创建一个空白pdf页面,并在其中嵌入链接(使用reportlab),并将其合并(使用PyPDF2)与现有的pdf页面.我在reportlab上遇到了一个问题,即保存画布使用的StringIO需要更改为BytesIO,但是这样做之后,我遇到了此错误:

I recently upgraded from Python 2.7 to 3.4, and am in the process of porting my code. This code is used to create a blank pdf page with links embedded in it (using reportlab) and merge it (using PyPDF2) with an existing pdf page. I had an issue with reportlab in that saving the canvas used StringIO which needed to be changed to BytesIO, but after doing that I ran into this error:

Traceback (most recent call last):
File "C:\cms_software\pdf_replica\builder.py", line 401, in merge_pdf_files
    input_page.mergePage(link_page)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2013, in mergePage
    self.mergePage(page2)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2059, in mergePage
    page2Content = PageObject._pushPopGS(page2Content, self.pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1973, in _pushPopGS
    stream = ContentStream(contents, pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2446, in __init
    stream = BytesIO(b_(stream.getData()))
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 826, in getData
    decoded._data = filters.decodeStreamData(self)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 326, in decodeStreamData
    data = ASCII85Decode.decode(data)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in decode
    data = [y for y in data if not (y in ' \n\r\t')]
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in 
    data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int

这是回溯提到的那一行以及上面的那一行:

Here is the line and the line above where the traceback mentions:

link_page = self.make_pdf_link_page(pdf, size, margin, scale_factor, debug_article_links)
if link_page != None:
input_page.mergePage(link_page)

以下是该make_pdf_link_page函数的相关部分:

Here are the relevant parts of that make_pdf_link_page function:

packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=(size['width'], size['height']))
....# left out code here is just reportlab specifics for size and url stuff
can.linkURL(url, r1, thickness=1, color=colors.green)
can.rect(x1, y1, width, height, stroke=1, fill=0)
# create a new PDF with Reportlab that has the url link embedded
can.save()
packet.seek(0)
try:
    new_pdf = PdfFileReader(packet)
except Exception as e:
    logger.exception('e')
    return None
return new_pdf.getPage(0)

我假设使用BytesIO存在问题,但无法使用带有StringIO的reportlab创建页面.这是一个曾经与Python 2.7完美配合的关键功能,因此,我希望能收到任何反馈.谢谢!

I'm assuming it's a problem with using BytesIO, but I can't create the page with reportlab with StringIO. This is a critical feature that used to work perfectly with Python 2.7, so I'd appreciate any kind of feedback on this. Thanks!

更新: 我还尝试过从使用BytesIO更改为仅写入临时文件,然后合并.不幸的是我遇到了同样的错误. 这是临时文件版本:

UPDATE: I've also tried changing from using BytesIO to just writing to a temp file, then merging. Unfortunately I got the same error. Here is tempfile version:

import tempfile
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, "tmp.pdf")
can = canvas.Canvas(temp_path, pagesize=(size['width'], size['height']))
....
can.showPage()
can.save()
try:
    new_pdf = PdfFileReader(temp_path)
except Exception as e:
    logger.exception('e')
    return None
return new_pdf.getPage(0)

更新: 我发现了一些有趣的信息.看来,如果我注释掉can.rect和can.linkURL调用,它将合并.因此,在页面上绘制任何内容,然后尝试将其与我现有的pdf合并会导致错误.

UPDATE: I found an interesting bit of information on this. It seems if I comment out the can.rect and can.linkURL calls it will merge. So drawing anything on a page, then trying to merge it with my existing pdf is causing the error.

推荐答案

深入研究PyPDF2库代码后,我找到了自己的答案.对于python 3用户,旧库可能很棘手.即使他们说他们支持python 3,他们也不一定测试所有东西.在这种情况下,问题出在PyPDF2的filters.py中的类ASCII85Decode.对于python 3,此类需要返回字节.我从pdfminer3k借用了相同类型的函数的代码,这是pdfminer的python 3的端口.如果您将ASCII85Decode()类交换为该代码,它将起作用:

After digging in to PyPDF2 library code, I was able to find my own answer. For python 3 users, old libraries can be tricky. Even if they say they support python 3, they don't necessarily test everything. In this case, the problem was with the class ASCII85Decode in filters.py in PyPDF2. For python 3, this class needs to return bytes. I borrowed the code for this same type of function from pdfminer3k, which is a port for python 3 of pdfminer. If you exchange the ASCII85Decode() class for this code, it will work:

import struct
class ASCII85Decode(object):
    def decode(data, decodeParms=None):
        if isinstance(data, str):
            data = data.encode('ascii')
        n = b = 0
        out = bytearray()
        for c in data:
            if ord('!') <= c and c <= ord('u'):
                n += 1
                b = b*85+(c-33)
                if n == 5:
                    out += struct.pack(b'>L',b)
                    n = b = 0
            elif c == ord('z'):
                assert n == 0
                out += b'\0\0\0\0'
            elif c == ord('~'):
                if n:
                    for _ in range(5-n):
                        b = b*85+84
                    out += struct.pack(b'>L',b)[:n-1]
                break
        return bytes(out)

这篇关于移植到Python3:PyPDF2 mergePage()给出TypeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆