用于索引和搜索的Lucene分析器 [英] Lucene Analyzer for Indexing and Searching
问题描述
@Field(name =hungerState,index = Index.TOKENIZED,store = Store.YES)
public HungerState getHungerState(){
该字段的可能值是 HUNGRY,SLIGHTLY_HUNGRY和NOT_HUNGRY
当这些值使用 StandardAnalyzer
,这些术语最终为饥饿,略微
,因为它在标点符号上标记并忽略不。
如果我将索引更改为 index = Index.UN_TOKENIZED
,索引条件是 HUNGRY,SLIGHTLY_HUNGRY和NOT_HUNGRY
,和预期一样。
我的搜索API有一个搜索方法,它构造了 Query
就像这样:
$ b $ pre $ MultiFieldQueryParser parser = new MultiFieldQueryParser StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
查询查询= parser.parse(searchTerms);
这会处理searchTerms =foo的搜索,它搜索 getSearchFields()
onfoo,以及searchTerms指定要搜索的字段和值的位置(即饥饿状态:HUNGRY)
我的问题是后一种情况。由于查询解析器使用的是StandardAnalyzer,因此搜索 hungerState:SLIGHTLY_HUNGRY
会被解析为 hungerState:略饿
和搜索 hungerState = NOT_HUNGRY
解析为 hungerState =饥饿
。
使用StandardAnalyzer对字段进行索引时,会得到意想不到的结果(搜索HUNGRY和NOT_HUNGRY会返回所有3个值的结果)。当字段被索引为UN_TOKENIZED时,我没有得到任何结果,因为查询解析器将搜索字符串标记为小写字母并使其小写。
我甚至试过指定用于索引的分析器,例如 KeywordAnalyzer
,但它几乎没有效果,因为每次都使用 StandardAnalyzer
分析整个搜索字符串。
任何意见,将不胜感激。谢谢!
您的查询解析器使用标准分析器,因此您的查询将使用标准分析器。只需切换到使用关键字分析器:
pre $ MultiFieldQueryParser parser = new MultiFieldQueryParser新的KeywordAnalyzer(Version.LUCENE_30));
您可能需要使用 PerFieldAnalyzerWrapper 如果您的其他字段不是关键字。
I have a field that I am indexing with Lucene like so:
@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {
The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
When these values are indexed using the StandardAnalyzer
, the terms end up as hungry, slightly
since it tokenizes on punctuation and ignores the "not".
If I change the index to index=Index.UN_TOKENIZED
, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
, as expected.
My search API has 1 "search" method that constructs the Query
like so:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);
This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields()
on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")
My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY
get parsed into hungerState:"slightly hungry"
and searches for hungerState=NOT_HUNGRY
get parsed into hungerState=hungry
.
When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.
I've even tried specifying an Analyzer for indexing like KeywordAnalyzer
, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer
every time.
Any advice would be appreciated. Thanks!
You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(),
new KeywordAnalyzer(Version.LUCENE_30));
You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.
这篇关于用于索引和搜索的Lucene分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!