Lucene 搜索和下划线 [英] Lucene search and underscores
问题描述
当我使用 Luke 使用标准分析器搜索我的 Lucene 索引时,我可以看到我正在搜索的字段包含 MY_VALUE 形式的值.但是,当我搜索 field:"MY_VALUE" 时,查询被解析为 field:"my value"
When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"
有没有一种简单的方法来转义下划线 (_) 字符以便搜索它?
Is there a simple way to escape the underscore (_) character so that it will search for it?
2010 年 4 月 1 日上午 11:08(太平洋标准时间)
4/1/2010 11:08AM PST
我认为 Lucene 2.9.1 的标记器中存在一个错误,并且它可能以前就存在.加载Luke并尝试搜索BB_HHH_FFFF5_SSSS",当有数字时,返回以下token:
I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:
bb hhh_ffff5_ssss"
"bb hhh_ffff5_ssss"
经过一些测试,我发现这是因为数字.如果我输入
After some testing, I've found that this is because of the number. If I input
BB_HHH_FFFF_SSSS",我明白了
"BB_HHH_FFFF_SSSS", I get
bb hhh ffff ssss"
"bb hhh ffff ssss"
在这一点上,我倾向于标记器错误,除非数字的存在应该具有这种行为,但我不明白为什么.
At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.
谁能证实这一点?
推荐答案
您似乎没有使用 StandardAnalyzer 来索引该字段.在 Luke 中,您需要选择用于索引该字段的分析器,以便正确匹配 MY_VALUE.
It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.
顺便说一句,您可以使用 KeywordAnalyzer 匹配 MY_VALUE.
Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.
这篇关于Lucene 搜索和下划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!