搜索pdf文件以获取特定信息 [英] searching pdf files for certain info

查看：71 发布时间：2019/6/5 14:51:11 python

本文介绍了搜索pdf文件以获取特定信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

不是真正的Python问题......但是这里有：有没有办法阅读PDF文件的内容并用Python解码它？我想阅读

PDF'，解码它们，然后在数据中搜索某些字符串。

谢谢，rbt

Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.

Thanks, rbt

推荐答案

rbt写道：

rbt wrote:

不是真正的Python问题......但是这里有：有没有办法阅读PDF文件的内容并用Python解码？我想阅读
PDF'，解码它们，然后搜索数据中的某些字符串。

Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.

有一个商业工具pdflib availablebla，这可能会有所帮助。它有一个免费的

评估版和python绑定。

如果它只是关于文本，也许pdf2text有帮助。

-

问候，

Diez B. Roggisch

There is a commercial tool pdflib availabla, that might help. It has a free
evaluation version, and python bindings.

If it''s only about text, maybe pdf2text helps.
--
Regards,

Diez B. Roggisch

Aloha，

rbt写道：

Aloha,

rbt wrote:

不是真正的Python问题......但是这里有：有没有办法阅读PDF文件的内容并解码它与Python？我想阅读
PDF'，解码它们，然后在数据中搜索某些字符串。

Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.

首先，
http：// groups。 google.de/groups?selm=...&output=gplain

仍然适用于此。

如果你可以处理一个pdf-lib的非常基本的实现你可能会感兴趣
http://sourceforge.net/projects/pdfplayground

在CVS（或当前快照）中，您可以找到

ppg / doc / text_extract.txt用于文本提取的示例。

First of all,
http://groups.google.de/groups?selm=...&output=gplain
still applies here.

If you can deal with a very basic implementation of a pdf-lib you
might be interested in
http://sourceforge.net/projects/pdfplayground

In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.

import pdffile
导入页面
import zlib
pf = pdffile.pdffile（''.. / pdf-testset1 / a.pdf''）
pp = pag es.pages（pf）
c = zlib.decompress（pf [pp.pagelist [0] [''/ Contents'']]。stream）
op = pdftool.parse_content（c）
sop = [x [1] for op in op if x [0] in ["''"，Tj]]
for a sop：

import pdffile
import pages
import zlib
pf = pdffile.pdffile(''../pdf-testset1/a.pdf'')
pp = pages.pages(pf)
c = zlib.decompress(pf[pp.pagelist[0][''/Contents'']].stream)
op = pdftool.parse_content(c)
sop = [x[1] for x in op if x[0] in ["''", "Tj"]]
for a in sop:

打印一份[0]

祝你节日快乐

LOBI

print a[0]

Wishing a happy day
LOBI

Andreas Lobinger写道：

Andreas Lobinger wrote:

Aloha，

rbt写道：

Aloha,

rbt wrote:

不是真正的Python问题。 ..但是这里有：有没有办法阅读PDF文件的内容并用Python解码？我想阅读
PDF'，解码它们，然后在数据中搜索某些字符串。

Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.

首先，
http://groups.google.de/ groups？selm = ...& output = gplain

仍然适用于此。

如果你可以处理一个非常基本的pdf-lib实现你可能对
感兴趣吗 http://sourceforge.net/projects / pdfplayground

在CVS（或当前快照）中，您可以在
ppg / Doc / text_extract.txt中找到文本提取的示例。

First of all,
http://groups.google.de/groups?selm=...&output=gplain

still applies here.

If you can deal with a very basic implementation of a pdf-lib you
might be interested in
http://sourceforge.net/projects/pdfplayground

In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.

>>> import pdffile
>>>导入页面
>>> import zlib
>>> pf = pdffile.pdffile（''.. / pdf-testset1 / a.pdf''）
>>> pp = pages.pages（pf）
>>> c = zlib.decompress（pf [pp.pagelist [0] [''/ Contents'']]。stream）
>>> op = pdftool.parse_content（c）
>>> sop = [x [1] for op in op if x [0] in ["'''，Tj]]
>>> for a sop：

>>> import pdffile
>>> import pages
>>> import zlib
>>> pf = pdffile.pdffile(''../pdf-testset1/a.pdf'')
>>> pp = pages.pages(pf)
>>> c = zlib.decompress(pf[pp.pagelist[0][''/Contents'']].stream)
>>> op = pdftool.parse_content(c)
>>> sop = [x[1] for x in op if x[0] in ["''", "Tj"]]
>>> for a in sop:

打印[0]

祝你节日快乐
LOBI

print a[0]

Wishing a happy day
LOBI

谢谢大家......如果我将它转换为PS，将其打印成文件或

，该怎么办？这会更容易使用吗？

Thanks guys... what if I convert it to PS via printing it to a file or
something? Would that make it easier to work with?

这篇关于搜索pdf文件以获取特定信息的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

搜索pdf文件以获取特定信息 [英] searching pdf files for certain info

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

搜索pdf文件以获取特定信息 [英] searching pdf files for certain info

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭