找不到EOF标记-如何在PyPDF和PyPDF2中修复? [英] EOF marker not found - How to fix in PyPDF and PyPDF2?

查看:1496
本文介绍了找不到EOF标记-如何在PyPDF和PyPDF2中修复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python将几个PDF文件合并为一个PDF文件.我已经尝试了PyPDF和PyPDF2-在某些文件上,它们都抛出相同的错误:

I'm attempting to combine a few PDF files into a single PDF file using Python. I've tried both PyPDF and PyPDF2 - on some files, they both throw this same error:

PdfReadError:找不到EOF标记

PdfReadError: EOF marker not found

这是我的代码(page_files),是要组合的PDF文件路径的列表:

Here's my code (page_files) is a list of PDF file paths to combine:

# use pypdf to combine pdf pages
output = PdfFileWriter()
for pf in page_files:
    filestream = file(pf, "rb")
    pdf = PdfFileReader(filestream)            
    for num in range(pdf.getNumPages()):
        output.addPage(pdf.getPage(num))            

# write final file
outputStream = file(pdf_full_path, "wb")
output.write(outputStream)
outputStream.close()

我已经阅读了有关该主题的一些StackOverflow线程,但没有一个包含有效的解决方案.如果您已经成功地使用Python组合了PDF文件,那么我很想听听如何.谢谢!

I've read a few StackOverflow threads on the topic, but none contain a solution that works. If you've successfully combined PDF files using Python, I'd love to hear how. Thanks!

推荐答案

是否仍然有人希望合并pdf的列表":

Is there is still someone looking for merging a "list" of pdfs:

注意: 使用glob获取正确的文件列表. <-这将真的使您的一天安全^^

Note: Using glob to get the correct filelist. <- this will really safe your day ^^

查看以下内容: glob模块参考

Check this out: glob module reference

from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
import os
import glob

class MergeAllPDF:
    def __init__(self):
        self.mergelist = []

    def create(self, filepath, outpath, outfilename):
        self.outfilname = outfilename
        self.filepath = filepath
        self.outpath = outpath
        self.pdfs = glob.glob(self.filepath)
        self.myrange = len(self.pdfs)

        for _ in range(self.myrange):
            if self.pdfs:
                self.mergelist.append(self.pdfs.pop(0))
        self.merge()

    def merge(self):
        if self.mergelist:
            self.merger = PdfFileMerger()
            for pdf in self.mergelist:
                self.merger.append(open(pdf, 'rb'))  
            self.merger.write(self.outpath + "%s.pdf" % (self.outfilname))
            self.merger.close()
            self.mergelist = []
        else:
            print("mergelist is empty please check your input path")

# example how to use
#update your path here:


inpath = r"C:\Users\Fabian\Desktop\mergeallpdfs\scan\*.pdf" #here are your single page pdfs stored
outpath = r"C:\Users\Fabian\Desktop\mergeallpdfs\output\\" #here your merged pdf will be stored

b = MergeAllPDF()
b.create(inpath, outpath, "mergedpdf")

这篇关于找不到EOF标记-如何在PyPDF和PyPDF2中修复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆