lucene:如何添加文档而不重复 [英] lucene: how to add document without duplication

查看:39
本文介绍了lucene:如何添加文档而不重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就我而言,插入到lucene索引中的每个文档都有其唯一的ID.在将新文档添加到Lucene索引中时,如果该文档已存在于索引中,则不应将该文档插入索引中.如何实施此策略?
我认为我应该首先使用docId搜索文档,如果lucene找不到文档,则将其插入.但是,由于我有3个线程共享唯一的indexWriter进行索引,因此我猜应该有一些错误的情况.例如:线程1和线程2正在处理两个具有相同docId的文档,如果线程1搜索docId却没有发现任何东西,它将把该文档插入索引中,但是线程2可能在线程1读取索引后将其文档插入到索引中.结果,索引中存在两个文档.如何避免这种情况?

In my case, every document inserted into lucene index has its unique ID. When adding a new document into lucene index, if the document has existed in index, the document should not be inserted into index. How to implement this strategy?
I think I should first search the document with docId, if lucene can't find the document, then I insert it. However, because I have 3 threads sharing the only indexWriter to index, I guess there's should be some wrong cases. For example: thread 1 and thread 2 are handling two documents with same docId, if thread1 searched the docId and found nothing, it would insert the document into the index, but thread2 may insert its document into index after thread1 reading the index. As a result, there exist two documents in the index. How can I avoid this?

推荐答案

IndexWriter.updateDocument will atomically delete and add a doc based on a term.

这篇关于lucene:如何添加文档而不重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆