在lucene中搜索UUID无法正常工作 [英] Searching for UUID in lucene not working

查看:152
本文介绍了在lucene中搜索UUID无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个UUID字段,我将按以下格式添加到我的文档中:372d325c-e01b-432f-98bd-bc4c949f15b8。但是,当我尝试通过UUID查询文档时,无论我如何尝试转义表达式,它都不会返回它们。例如:

I've got a UUID field I'm adding to my document in the following format: 372d325c-e01b-432f-98bd-bc4c949f15b8. However, when I try to query for documents by the UUID it will not return them no matter how I try to escape the expression. For example:

+uuid:372d325c-e01b-432f-98bd-bc4c949f15b8
+uuid:"372d325c-e01b-432f-98bd-bc4c949f15b8"
+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8
+uuid:(372d325c-e01b-432f-98bd-bc4c949f15b8)
+uuid:("372d325c-e01b-432f-98bd-bc4c949f15b8")

甚至使用TermQuery完全跳过QueryParser,如下所示:

And even skipping the QueryParser altogether using TermQuery like so:

new TermQuery(new Term("uuid", uuid.toString()))

new TermQuery(new Term("uuid", QueryParser.escape(uuid.toString())))

这些搜索都不会返回文档,但如果我搜索部分UUID,它将返回一个文档。例如,这些将返回一些内容:

None of these searches will return a document, but if I search for portions of the UUID it will return a document. For example these will return something:

+uuid:372d325c
+uuid:e01b
+uuid:432f

我应该怎么做才能索引这些文件,以便我可以通过他们的UUID取回它们?我已经考虑重新格式化UUID以删除连字符,但我还没有实现它。

What should I do to index these documents so I can pull them back by their UUID? I've considered reformatting the UUID to remove the hyphens, but I haven't implemented it yet.

推荐答案

唯一的方法我得到这个工作是使用WhitespaceAnalyzer而不是StandardAnalyzer。然后使用如下的TermQuery:

The only way I got this to work is to use WhitespaceAnalyzer instead of StandardAnalyzer. Then using a TermQuery like so:

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new WhitespaceAnalyzer(Version.LUCENE_36))
            .setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
writer = new IndexWriter( directory, config);

然后搜索:

TopDocs docs = searcher.search(new TermQuery(new Term("uuid", uuid.toString())), 1);

WhitespaceAnalyzer阻止Lucene用连字符拆分UUID。另一种选择可能是消除UUID中的破折号,但使用WhitespaceAnalyzer也可以用于我的目的。

WhitespaceAnalyzer prevented Lucene from splitting apart the UUID by the hyphens. Another option could be to eliminate the dashes from the UUID, but using the WhitespaceAnalyzer works just as well for my purposes.

这篇关于在lucene中搜索UUID无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆