搜索pdf文件以获取特定信息 [英] searching pdf files for certain info
问题描述
不是真正的Python问题......但是这里有:有没有办法阅读PDF文件的内容并用Python解码它?我想阅读
PDF',解码它们,然后在数据中搜索某些字符串。
谢谢,rbt
Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.
Thanks, rbt
推荐答案
rbt写道:
rbt wrote:
不是真正的Python问题......但是这里有:有没有办法阅读PDF文件的内容并用Python解码?我想阅读
PDF',解码它们,然后搜索数据中的某些字符串。
Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.
有一个商业工具pdflib availablebla,这可能会有所帮助。它有一个免费的
评估版和python绑定。
如果它只是关于文本,也许pdf2text有帮助。
-
问候,
Diez B. Roggisch
There is a commercial tool pdflib availabla, that might help. It has a free
evaluation version, and python bindings.
If it''s only about text, maybe pdf2text helps.
--
Regards,
Diez B. Roggisch
Aloha,>
rbt写道:
Aloha,
rbt wrote:
不是真正的Python问题......但是这里有:有没有办法阅读PDF文件的内容并解码它与Python?我想阅读
PDF',解码它们,然后在数据中搜索某些字符串。
Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.
首先,
http:// groups。 google.de/groups?selm=...&output=gplain
仍然适用于此。
如果你可以处理一个pdf-lib的非常基本的实现你可能会感兴趣
http://sourceforge.net/projects/pdfplayground
在CVS(或当前快照)中,您可以找到
ppg / doc / text_extract.txt用于文本提取的示例。
First of all,
http://groups.google.de/groups?selm=...&output=gplain
still applies here.
If you can deal with a very basic implementation of a pdf-lib you
might be interested in
http://sourceforge.net/projects/pdfplayground
In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.
import pdffile
导入页面
import zlib
pf = pdffile.pdffile(''.. / pdf-testset1 / a.pdf'')
pp = pag es.pages(pf)
c = zlib.decompress(pf [pp.pagelist [0] [''/ Contents'']]。stream)
op = pdftool.parse_content(c)
sop = [x [1] for op in op if x [0] in ["''",Tj]]
for a sop:
import pdffile
import pages
import zlib
pf = pdffile.pdffile(''../pdf-testset1/a.pdf'')
pp = pages.pages(pf)
c = zlib.decompress(pf[pp.pagelist[0][''/Contents'']].stream)
op = pdftool.parse_content(c)
sop = [x[1] for x in op if x[0] in ["''", "Tj"]]
for a in sop:
打印一份[0]
祝你节日快乐
LOBI
print a[0]
Wishing a happy day
LOBI
Andreas Lobinger写道:
Andreas Lobinger wrote:
Aloha,
rbt写道:
Aloha,
rbt wrote:
不是真正的Python问题。 ..但是这里有:有没有办法阅读PDF文件的内容并用Python解码?我想阅读
PDF',解码它们,然后在数据中搜索某些字符串。
Not really a Python question... but here goes: Is there a way to read
the content of a PDF file and decode it with Python? I''d like to read
PDF''s, decode them, and then search the data for certain strings.
首先,
http://groups.google.de/ groups?selm = ...& output = gplain
仍然适用于此。
如果你可以处理一个非常基本的pdf-lib实现你可能对
感兴趣吗
在CVS(或当前快照)中,您可以在
ppg / Doc / text_extract.txt中找到文本提取的示例。
First of all,
http://groups.google.de/groups?selm=...&output=gplain
still applies here.
If you can deal with a very basic implementation of a pdf-lib you
might be interested in
http://sourceforge.net/projects/pdfplayground
In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.
>>> import pdffile
>>>导入页面
>>> import zlib
>>> pf = pdffile.pdffile(''.. / pdf-testset1 / a.pdf'')
>>> pp = pages.pages(pf)
>>> c = zlib.decompress(pf [pp.pagelist [0] [''/ Contents'']]。stream)
>>> op = pdftool.parse_content(c)
>>> sop = [x [1] for op in op if x [0] in ["''',Tj]]
>>> for a sop:
>>> import pdffile
>>> import pages
>>> import zlib
>>> pf = pdffile.pdffile(''../pdf-testset1/a.pdf'')
>>> pp = pages.pages(pf)
>>> c = zlib.decompress(pf[pp.pagelist[0][''/Contents'']].stream)
>>> op = pdftool.parse_content(c)
>>> sop = [x[1] for x in op if x[0] in ["''", "Tj"]]
>>> for a in sop:
打印[0]
祝你节日快乐
LOBI
print a[0]
Wishing a happy day
LOBI
>
谢谢大家......如果我将它转换为PS,将其打印成文件或
,该怎么办?这会更容易使用吗?
Thanks guys... what if I convert it to PS via printing it to a file or
something? Would that make it easier to work with?
这篇关于搜索pdf文件以获取特定信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!