PyPDF2 写入对某些 PDF 文件不起作用(Python 3.5.1) [英] PyPDF2 write doesn't work on some PDF files (Python 3.5.1)

查看：187 发布时间：2021/7/7 20:37:07 python python-3.x pdf reportlab pypdf2

本文介绍了PyPDF2 写入对某些 PDF 文件不起作用(Python 3.5.1)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先我使用的是 Python 3.5.1(32 位版本)我编写了以下程序，使用 PyPDF2 和 reportlab 在我的 pdf 文件的所有页面上添加页码:

First of all I am using Python 3.5.1 (32 bit version) I wrote the following program to add a pagenumber on all pages of my pdf files using PyPDF2 and reportlab:

#import modules
from os import listdir
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
#initial values of variable declarations
PDFlist=[]
X_value=460
Y_value=820
#Make a list of al files in de directory
filelist = listdir()
#Make a list of all pdf files in the directory
for i in range(0,len(filelist)):
    filename=filelist[i]
    for j in range(0,len(filename)):
        char=filename[j]
        if char=='.':
            extension=filename[j+1:j+4]
            if extension=='pdf':
                PDFlist.append(filename)
        j=j+1
    i=i+1
# Give the horizontal position for the page number (Enter = use default value of 480)
User = input('Give horizontal position page number (ENTER = default 460): ')
if User != "":
    X_value=int(User)
# Give the vertical position for the page number (Enter = use default value of 820)
User = input('Give horizontal position page number (ENTER = default 820): ')
if User != "":
    Y_value=int(User)

for i in range(0,len(PDFlist)):
    filename=PDFlist[i]

    # read the PDF
    existing_pdf = PdfFileReader(open(filename, "rb"))
    print("File: "+filename)
    # count the number of pages
    number_of_pages = existing_pdf.getNumPages()
    print("Number of pages detected:"+str(number_of_pages))
    output = PdfFileWriter()

    for k in range(0,number_of_pages):
        packet = io.BytesIO()

        # create a new PDF with Reportlab
        can = canvas.Canvas(packet, pagesize=A4)
        Pagenumber=" Page "+str(k+1)+"/"+str(number_of_pages)
        # we first make a white rectangle to cover any existing text in the pdf
        can.setFillColorRGB(1,1,1)
        can.setStrokeColorRGB(1,1,1)
        can.rect(X_value-10,Y_value-5,120,20,fill=1)
        # set the font and size
        can.setFont("Helvetica",14)
        # choose color of page numbers (red)
        can.setFillColorRGB(1,0,0)
        can.drawString(X_value, Y_value, Pagenumber)
        can.save()
        print(Pagenumber)

        #move to the beginning of the StringIO buffer
        packet.seek(0)
        new_pdf = PdfFileReader(packet)
        # add the "watermark" (which is the new pdf) on the existing page
        page = existing_pdf.getPage(k)
        page.mergePage(new_pdf.getPage(0))
        output.addPage(page)
        k=k+1
    # finally, write "output" to a real file

    ResultPDF="Output/"+filename
    outputStream = open(ResultPDF, "wb")
    output.write(outputStream)
    outputStream.close()
    i=i+1

该程序适用于相当多的 PDF 文件(尽管有时会生成警告，如 'PdfReadWarning: 在对象标题 b'16' b'0' [pdf.py:1666] 中发现多余的空白)' 但结果输出文件对我来说没问题).但是，该程序无法处理某些 PDF 文件，尽管这些文件在我的 Adobe Acrobat 中完全可读和可编辑.我的印象是该错误主要出现在已扫描的 PDF 文件上，但并非全部出现在这些文件上(我还对没有生成任何错误的扫描 PDF 文件进行了编号).我收到以下错误消息(前 8 行是我自己的打印命令的结果):

This program works fine for quite a number of PDF files (albeit that warnings are sometimes generated like 'PdfReadWarning: Superfluous whitespace found in object header b'16' b'0' [pdf.py:1666]' but the resulting output file is okay to me). However, the program just doesn't work on some PDF files although these files are perfectly readable and editable with my Adobe Acrobat. I have the impression the error pops up mostly on PDF files that were scanned but not on all of them (I also numbered scanned PDF files that didn't generate any error). I am getting the following error message (the first 8 lines are the result of my own print commands):

File: Scanned file.pdf
Number of pages detected:6
 Page 1/6
 Page 2/6
 Page 3/6
 Page 4/6
 Page 5/6
 Page 6/6
PdfReadWarning: Object 25 1 not defined. [pdf.py:1629]
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\Sourcecode\PDFPager.py", line 83, in <module>
    output.write(outputStream)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\site-packages\PyPDF2\pdf.py", line 1631, in getObject
    raise utils.PdfReadError("Could not find object.")
PyPDF2.utils.PdfReadError: Could not find object.

显然这些页面与由 reportlab 创建的 PDF 合并(请参阅第 6/6 页的行)，但最终 PyPDF2 无法生成输出 PDF 文件(我得到了一个 0 字节的无法读取的输出文件).有人可以解释一下如何解决这个问题吗?我搜索了互联网，但找不到真正的答案.

Apparently the pages are merged with the PDF created by reportlab (see lines up to page 6/6) but in the end no output PDF file can be generated by PyPDF2 (I get an unreadible output file of 0 bytes). Can somebody shed some light on how to resolve this? I searched the internet but couldn't really find an answer.

PyPDF2 写入对某些 PDF 文件不起作用(Python 3.5.1) [英] PyPDF2 write doesn't work on some PDF files (Python 3.5.1)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PyPDF2 写入对某些 PDF 文件不起作用(Python 3.5.1) [英] PyPDF2 write doesn&#39;t work on some PDF files (Python 3.5.1)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

PyPDF2 写入对某些 PDF 文件不起作用(Python 3.5.1) [英] PyPDF2 write doesn't work on some PDF files (Python 3.5.1)

登录关闭