像关系数据库一样使用 Lucene [英] Using Lucene like a relational database
问题描述
我只是想知道我们是否可以在 lucene 中实现一些 RDBMS 功能.
I am just wondering if we could achieve some RDBMS capabilities in lucene.
示例:1) 我有 10,000 个项目文档(pdf 文件),必须对其内容进行索引,以使它们可供搜索.2)每个文件都与一个项目有关.项目可以包含项目名称、编号、开始日期、结束日期、位置、类型等详细信息.
Example: 1) I have 10,000 project documents (pdf files) which have to be indexed with their content to make them available for search. 2) Every document is related to a SINGLE PROJECT. The project can contain details like project name, number, start date, end date, location, type etc.
我必须在 pdf 文件的内容中搜索给定关键字,但在显示结果时,我想显示项目元数据,如第 (2) 点所述.
I have to search in the contents of the pdf files for a given keyword, but while displaying the results I want to display the project meta data as mentioned in point (2).
我的想法是在索引时将一个名为 projectId 的字段与每个 pdf 文件相关联.一旦我们得到它,我们将再次触发搜索以获取项目元数据.
My idea is to associate a field called projectId with each pdf file while indexing. Once we get that, we will fire search again for getting project meta data.
这样我们可以避免重复数据.此外,如果我们想更新项目元数据,我们最终将只在一个地方进行更新.否则,如果我们将此元数据与所有 pdf 文档索引一起存储,我们最终将更新所有文档,这不是我想要的方式.
This way we could avoid duplicated data. Also, if we want to update the project meta data we will end up updating at a SINGLE PLACE only. Otherwise if we store this meta data with all the pdf doument indexes, we will end up updating all of the documents, which is not the way I am looking for.
请指教.
推荐答案
你可以这样使用Lucene;
You can use Lucene that way;
优点:
全文搜索很容易实现,而在 RDBMS 中则不然.
Full-text search is easy to implement, which is not the case in an RDBMS.
缺点:
参照完整性:您可以在 RDBMS 中免费获得它,但在 Lucene 中,您必须自己实现它.
Referential integrity: you get it for free in an RDBMS, but in Lucene, you must implement it yourself.
这篇关于像关系数据库一样使用 Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!