pdfminer python 3.5 [英] Pdfminer python 3.5

查看：104 发布时间：2020/5/25 3:56:11 python-3.x pdf text extract pdfminer

本文介绍了pdfminer python 3.5的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经遵循了一些教程，但是我无法运行该代码块，我做了必要的从StringIO到BytesIO的切换(我相信吗?)

I have followed a few tutorials around but I am not able to get this code block to run, I did the necessary switches from StringIO to BytesIO (I believe?)

我不确定为什么香蕉"什么都不印刷，我认为错误可能是红色鲱鱼?遵循python2.7教程并尝试将其翻译为python3，这与我有关系吗?

I am unsure why 'banana' is printing nothing, I think the errors might be red herrings? is it something to do with me following a python2.7 tutorial and trying to translate it to python3?

errors: File "/Users/foo/PycharmProjects/Try/Pdfminer.py", line 28, in <module>
    banana = convert("A1.pdf")
  File "/Users/foo/PycharmProjects/Try/Pdfminer.py", line 19, in convert
    infile = file(fname, 'rb')
NameError: name 'file' is not defined

脚本

from io import BytesIO

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

def convert(fname, pages=None):
    if not pages:
        pagenums = set()
    else:
        pagenums = set(pages)

    output = BytesIO()
    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams=LAParams())
    interpreter = PDFPageInterpreter(manager, converter)

    infile = file(fname, 'rb')
    for page in PDFPage.get_pages(infile, pagenums):
        interpreter.process_page(page)
    infile.close()
    converter.close()
    text = output.getvalue()
    output.close
    return text

banana = convert("A1.pdf")
print(banana)

此变体也会发生相同的事情:

The same thing happens with this variant:

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import BytesIO

def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = BytesIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
    return text

Banana = convert_pdf_to_txt("A1.pdf")
print(Banana)

我尝试搜索此文件(大多数pdfminer代码来自此或此)但没有运气.

I have tried searching for this (most of the pdfminer code is from this or this) but having no luck.

任何见识都会受到赞赏.

Any insight is appreciated.

欢呼

pdfminer python 3.5 [英] Pdfminer python 3.5

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

pdfminer python 3.5 [英] Pdfminer python 3.5

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭