Lucene 搜索和下划线 [英] Lucene search and underscores

查看:27
本文介绍了Lucene 搜索和下划线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 Luke 使用标准分析器搜索我的 Lucene 索引时,我可以看到我正在搜索的字段包含 MY_VALUE 形式的值.但是,当我搜索 field:"MY_VALUE" 时,查询被解析为 field:"my value"

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

有没有一种简单的方法来转义下划线 (_) 字符以便搜索它?

Is there a simple way to escape the underscore (_) character so that it will search for it?

2010 年 4 月 1 日上午 11:08(太平洋标准时间)

4/1/2010 11:08AM PST

我认为 Lucene 2.9.1 的标记器中存在一个错误,并且它可能以前就存在.加载Luke并尝试搜索BB_HHH_FFFF5_SSSS",当有数字时,返回以下token:

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

bb hhh_ffff5_ssss"

"bb hhh_ffff5_ssss"

经过一些测试,我发现这是因为数字.如果我输入

After some testing, I've found that this is because of the number. If I input

BB_HHH_FFFF_SSSS",我明白了

"BB_HHH_FFFF_SSSS", I get

bb hhh ffff ssss"

"bb hhh ffff ssss"

在这一点上,我倾向于标记器错误,除非数字的存在应该具有这种行为,但我不明白为什么.

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

谁能证实这一点?

推荐答案

您似乎没有使用 StandardAnalyzer 来索引该字段.在 Luke 中,您需要选择用于索引该字段的分析器,以便正确匹配 MY_VALUE.

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

顺便说一句,您可以使用 KeywordAnalyzer 匹配 MY_VALUE.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

这篇关于Lucene 搜索和下划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆