Lucene搜索和下划线 [英] Lucene search and underscores

查看:98
本文介绍了Lucene搜索和下划线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 Luke 使用标准分析仪搜索我的Lucene索引时,我正在搜索的字段包含格式为MY_VALUE的值. 但是,当我搜索字段:"MY_VALUE"时,查询将被解析为字段:我的值"

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

是否有一种简单的方法来转义下划线(_)字符以便其进行搜索?

Is there a simple way to escape the underscore (_) character so that it will search for it?

2010年4月1日太平洋标准时间上午11:08

4/1/2010 11:08AM PST

我认为Lucene 2.9.1的令牌生成器中存在一个错误,并且可能以前存在. 加载Luke并尝试搜索"BB_HHH_FFFF5_SSSS",当有数字时,将返回以下令牌:

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

"bb hhh_ffff5_sssss"

"bb hhh_ffff5_ssss"

经过一些测试,我发现这是由于数字造成的.如果我输入

After some testing, I've found that this is because of the number. If I input

"BB_HHH_FFFF_SSSS",我得到

"BB_HHH_FFFF_SSSS", I get

"bb hhh ffff ssss"

"bb hhh ffff ssss"

在这一点上,除非存在数字应该具有这种行为,否则我倾向于使用分词器错误,但我不明白为什么.

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

任何人都可以确认吗?

推荐答案

您似乎没有使用StandardAnalyzer对该字段建立索引.在Luke中,您需要选择用于索引该字段的分析器,以便正确匹配MY_VALUE.

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

偶然地,您可以使用KeywordAnalyzer匹配MY_VALUE.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

这篇关于Lucene搜索和下划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆