如何在多个PDF文档中搜索单词或数字。一气呵成! [英] How to search a word or number in multiple PDF documents. In one go!

查看:457
本文介绍了如何在多个PDF文档中搜索单词或数字。一气呵成!的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello All,



首先,我是一步一步的新编码和学习。



我一直在寻找逻辑在多个PDF文档中搜索一个单词,但是徒劳无功。任何人都可以分享想法或逻辑来获得所述过程,这将是非常有帮助的。



让我用一个例子详细说明我的操作。

嗯,我有多个PDF文件,我需要检查一个特定的单词/数字在其中,如果单词出现在PDF中,那么它应该返回 True 如果没有将 False 与包含文件的错误语句一起抛出错误。



例如:



1. ABS.pdf有123(包含数字)

2. CCC.pdf有123(包含数字)

3. XYZ.pdf有145(不包含数字)



好​​吧,如果使用关键字123搜索上述文件,则应用程序应返回XYZ.pdf不包含123号。



注意:此操作应该一次性批量/多个PDF。



我尝试了什么:



我偶然发现了名为iTextSharp的DLL,但缺乏逻辑,如何隔离代码。



任何h elp将不胜感激。



谢谢

Saikrishna

Hello All,

Firstly, am new to this coding and learning step by step.

I have been searching for a logic to "Search a word in multiple PDF Documents", but in vain. Can anyone please share the thoughts or logic to obtain said process, it will be greatly helpful.

Let me elaborate my operation with an example.
Well, i have multiple PDF files and i need to check a particular word/number within it, if the word appears in the PDF's then it should return True if not throw an error as False with the statement of error containing file.

Example:

1. ABS.pdf has "123" (Contains the number)
2. CCC.pdf has "123" (Contains the number)
3. XYZ.pdf has "145" (Doesn't contain the number)

Well, if the above files is searched using the keyword "123" then the application should return "XYZ.pdf" Doesn't contain "123" number.

Note: This operation should be done on bulk/multiple PDF's in one go.

What I have tried:

I stumbled upon the DLL called iTextSharp but lacking the logic, how to segregate the code.

Any help will be greatly appreciated.

Thanks
Saikrishna

推荐答案

你可以使用PDF IFilter 文本提取器来获取pdf内部文本。



如果你想搜索然后开始在这篇文章: hOOt - 全文搜索引擎 [ ^ ]
You can use PDF IFilter text extractors to get at the pdf internal text.

If you want to search then start at this article : hOOt - full text search engine[^]


这篇关于如何在多个PDF文档中搜索单词或数字。一气呵成!的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆