在Word文档中搜索关键字并将其编入索引 [英] Search for keywords in Word documents and index them
问题描述
我正在寻找一种搜索Word文档并显示符合搜索条件的文档结果的方法。我将尝试在这里更详细地描述场景。
I'm looking for a way to search in Word documents and show a result of documents that matched the search criteria. I'll try to describe the scenario in more detail here.
在Windows系统上,我有一堆文件夹。每个文件夹都有很多Word文档。现在我需要一个应用程序,可以在特定文件夹中搜索那些word文档中可能出现的关键字。类似于MySQL拥有的 FULLTEXT 搜索。
On a Windows system i have a bunch of folders. Each folder has alot of Word documents. Now i need an application that can search inside a specific folder for keywords that might occure in those word documents. Something like the FULLTEXT search that MySQL has.
因此,如果我搜索以下关键字: microsoft,windows XP
然后我希望它列出包含一个或多个关键字的每个Word文档。
So if i search for the following keywords: microsoft, windows XP
then i want it to list every Word document that contains one or more of those keywords.
当然,这些关键字出现在文档中的次数越多,排名应该在结果列表中越高。
Ofcourse, the more those keywords appear a document, the higher its rank should be in the resulting list.
现在我的问题是,是否有这样的工具可以做到这一点?或者我最好自己在C#.NET中编写这样的工具?如果是这样,我需要查看哪些API?
Now my question is, is there such a tool out there that does exactly this? Or am i better of writing such a tool myself in C#.NET? If so, to what API's do i have to look?
PS。它们是 .doc
和 .docx
文件。
PS. They are .doc
and .docx
files.
推荐答案
看起来你需要一个成熟的搜索引擎给我,包括解析,索引,排名,搜索等等。你自己实现它可能不是很愉快...你可以看看在 Apache Lucene 。
Looks like you need a full-blown search engine to me, including parsing, indexing, ranking, search, etc. Probably not very pleasant to implement it yourself... You could have a look at Apache Lucene.
这篇关于在Word文档中搜索关键字并将其编入索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!