用于索引和搜索的Lucene分析器 [英] Lucene Analyzer for Indexing and Searching

查看:101
本文介绍了用于索引和搜索的Lucene分析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  @Field(name =hungerState,index = Index.TOKENIZED,store = Store.YES)
public HungerState getHungerState(){

该字段的可能值是 HUNGRY,SLIGHTLY_HUNGRY和NOT_HUNGRY



当这些值使用 StandardAnalyzer ,这些术语最终为饥饿,略微,因为它在标点符号上标记并忽略不。

如果我将索引更改为 index = Index.UN_TOKENIZED ,索引条件是 HUNGRY,SLIGHTLY_HUNGRY和NOT_HUNGRY ,和预期一样。



我的搜索API有一个搜索方法,它构造了 Query 就像这样:
$ b $ pre $ MultiFieldQueryParser parser = new MultiFieldQueryParser StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
查询查询= parser.parse(searchTerms);

这会处理searchTerms =foo的搜索,它搜索 getSearchFields() onfoo,以及searchTerms指定要搜索的字段和值的位置(即饥饿状态:HUNGRY)

我的问题是后一种情况。由于查询解析器使用的是StandardAnalyzer,因此搜索 hungerState:SLIGHTLY_HUNGRY 会被解析为 hungerState:略饿和搜索 hungerState = NOT_HUNGRY 解析为 hungerState =饥饿



使用StandardAnalyzer对字段进行索引时,会得到意想不到的结果(搜索HUNGRY和NOT_HUNGRY会返回所有3个值的结果)。当字段被索引为UN_TOKENIZED时,我没有得到任何结果,因为查询解析器将搜索字符串标记为小写字母并使其小写。



我甚至试过指定用于索引的分析器,例如 KeywordAnalyzer ,但它几乎没有效果,因为每次都使用 StandardAnalyzer 分析整个搜索字符串。



任何意见,将不胜感激。谢谢!

解决方案

您的查询解析器使用标准分析器,因此您的查询将使用标准分析器。只需切换到使用关键字分析器:

pre $ MultiFieldQueryParser parser = new MultiFieldQueryParser新的KeywordAnalyzer(Version.LUCENE_30));

您可能需要使用 PerFieldAnalyzerWrapper 如果您的其他字段不是关键字。


I have a field that I am indexing with Lucene like so:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer, the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not".

If I change the index to index=Index.UN_TOKENIZED, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY, as expected.

My search API has 1 "search" method that constructs the Query like so:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")

My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry.

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time.

Any advice would be appreciated. Thanks!

解决方案

You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.

这篇关于用于索引和搜索的Lucene分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆