在Lucene中使用无唯一键的多表联接对数据库数据进行增量索引的方法 [英] Approach to Incrementally Index Database Data from Multi-Table Join in Lucene with No Unique Key

查看：60 发布时间：2020/5/4 7:57:47 sql lucene

本文介绍了在Lucene中使用无唯一键的多表联接对数据库数据进行增量索引的方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个特殊的SQL连接，例如:

I have a particular SQL join such that:

select DISTINCT ... 100 columns
from ... 10 tabes, some left joins

当前，我使用Toad将查询结果导出到XML(稍后将直接从Java查询它).我使用Java解析XML文件，并使用Lucene(Java)对其进行索引并搜索Lucene索引.效果很好:我得到的结果比从数据库中查询结果快6到10倍.

Currently I export the result of this query to XML using Toad (I'll query it straight from Java later). I use Java to parse the XML file, and I use Lucene (Java) to index it and to search the Lucene index. This works great: I get results 6-10 times faster than querying it from the database.

我需要考虑一种在数据库中的数据更改时增量更新此索引的方法.

I need to think of a way to incrementally update this index when the data in the database changes.

因为我要联接表(尤其是左联接)，所以我不确定能否获得唯一的业务键组合来进行增量更新.另一方面，因为我使用的是DISTINCT，所以我知道每个字段都是唯一的组合.有了这些信息，我想我可以将文档的hashCode放在文档的字段中，然后像这样在IndexWriter上调用updateDocument:

Because I am joining tables (especially left joins) I'm not sure I can get a unique business key combination to do an incremental update. On the other hand, because I am using DISTINCT, I know that every single field is a unique combination. Given this information, I thought I could put the hashCode of a document as a field of the document, and call updateDocument on the IndexWriter like this:

public static void addDoc(IndexWriter w, Row row) throws IOException {
    //Row is simply a java representation of a single row from the above query
    Document document = new Document();
    document.add(new StringField("fieldA", row.fieldA, Field.Store.YES));
    ...
    String hashCode = String.valueOf(document.hashCode());
    document.add(new StringField("HASH", hashCode, Field.Store.YES));
    w.updateDocument(new Term("HASH", hashCode), document);
}

然后我意识到updateDocument实际上是在删除具有匹配哈希码的文档，然后再次添加相同的文档，因此这没有任何用处.

Then I realized that updateDocument was actually deleting the document with the matching hash code and adding the identical document again, so this wasn't of any use.

解决这个问题的方法是什么?

What is the way to approach this?

在Lucene中使用无唯一键的多表联接对数据库数据进行增量索引的方法 [英] Approach to Incrementally Index Database Data from Multi-Table Join in Lucene with No Unique Key

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Lucene中使用无唯一键的多表联接对数据库数据进行增量索引的方法 [英] Approach to Incrementally Index Database Data from Multi-Table Join in Lucene with No Unique Key

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭