Lucene搜索和下划线 [英] Lucene search and underscores
问题描述
当我使用 Luke 使用标准分析仪搜索我的Lucene索引时,我正在搜索的字段包含格式为MY_VALUE的值. 但是,当我搜索字段:"MY_VALUE"时,查询将被解析为字段:我的值"
When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"
是否有一种简单的方法来转义下划线(_)字符以便其进行搜索?
Is there a simple way to escape the underscore (_) character so that it will search for it?
2010年4月1日太平洋标准时间上午11:08
4/1/2010 11:08AM PST
我认为Lucene 2.9.1的令牌生成器中存在一个错误,并且可能以前存在. 加载Luke并尝试搜索"BB_HHH_FFFF5_SSSS",当有数字时,将返回以下令牌:
I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:
"bb hhh_ffff5_sssss"
"bb hhh_ffff5_ssss"
经过一些测试,我发现这是由于数字造成的.如果我输入
After some testing, I've found that this is because of the number. If I input
"BB_HHH_FFFF_SSSS",我得到
"BB_HHH_FFFF_SSSS", I get
"bb hhh ffff ssss"
"bb hhh ffff ssss"
在这一点上,除非存在数字应该具有这种行为,否则我倾向于使用分词器错误,但我不明白为什么.
At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.
任何人都可以确认吗?
推荐答案
您似乎没有使用StandardAnalyzer对该字段建立索引.在Luke中,您需要选择用于索引该字段的分析器,以便正确匹配MY_VALUE.
It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.
偶然地,您可以使用KeywordAnalyzer匹配MY_VALUE.
Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.
这篇关于Lucene搜索和下划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!