休眠搜索|带有minGramSize 1的ngram分析器 [英] Hibernate Search | ngram analyzer with minGramSize 1
问题描述
我的Hibernate Search分析器配置存在一些问题. 我的一个索引实体(医院")具有一个String字段("name"),该字段可能包含长度为1至40的值.我希望能够仅通过搜索一个字符来查找实体(因为医院可能只有一个字符名称).
I have some problems with my Hibernate Search analyzer configuration. One of my indexed entities ("Hospital") has a String field ("name") that could contain values with lengths from 1-40. I want to be able to find a entity by searching for just one character (because it could be possible, that a hospital has single character name).
@Indexed(index = "HospitalIndex")
@AnalyzerDef(name = "ngram",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = NGramFilterFactory.class,
params = {
@Parameter(name = "minGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "40")})
})
public class Hospital {
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO, analyzer = @Analyzer(definition = "ngram"))
private String name = "";
}
如果我添加名为我的测试医院"的医院,则Lucene索引应如下所示:
If I add a hospital with name "My Test Hospital" the Lucene index looks like this:
1 name al
1 name e
1 name es
1 name est
1 name h
1 name ho
1 name hos
1 name hosp
1 name hospi
1 name hospit
1 name hospita
1 name hospital
1 name i
1 name it
1 name ita
1 name ital
1 name l
1 name m
1 name my
1 name o
1 name os
1 name osp
1 name ospi
1 name ospit
1 name ospita
1 name ospital
1 name p
1 name pi
1 name pit
1 name pita
1 name pital
1 name s
1 name sp
1 name spi
1 name spit
1 name spita
1 name spital
1 name st
1 name t
1 name ta
1 name tal
1 name te
1 name tes
1 name test
1 name y
1 name a
这是我构建和执行搜索查询的方式:
This is how I build and execute my search query:
QueryBuilder hospitalQb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class).get();
Query hospitalQuery = hospitalQb.keyword().onFields("name")().matching(searchString).createQuery();
javax.persistence.Query persistenceQuery = fullTextEntityManager.createFullTextQuery(hospitalQuery, Hospital.class);
List<Hospital> results = persistenceQuery.getResultList();
问题在于,同一ngram分析器也用于我的搜索查询.因此,当我搜索医院"时,我会发现所有名称中包含"a"字符的医院. 当我在其上调用toString方法时,这就是搜索查询的样子:
The problem is that the same ngram analyzer is also used for my search query. So when I am search for example for "hospital" I will find all hospitals that contains a "a"-character in the name. This is how the search query looks likes, when I call the toString method on it:
name:h name:ho name:hos name:hosp name:hospi name:hospit name:hospita name:hospital name:o name:os name:osp name:ospi name:ospit name:ospita name:ospital name:s name:sp name:spi name:spit name:spita name:spital name:p name:pi name:pit name:pita name:pital name:i name:it name:ita name:ital name:t name:ta name:tal name:a name:al name:l
问题是,有人知道更好的分析仪配置,还是以其他方式构建可以解决问题的搜索查询?
So the question is, does anybody know a better analyzer configuration or another way build the search query that solves the problem?
推荐答案
您可以设置第二个分析器,除了没有ngram过滤器外,其余相同,然后覆盖用于查询的分析器:
You can set up a second analyzer, identical except that it does not have an ngram filter, and then override the analyzer used for queries:
QueryBuilder hospitalQb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class)
.overridesForField( "name", "my_analyzer_without_ngrams" )
.get();
// Then it's business as usual
此外,如果要实现某种自动完成(foo*
),而不是词内搜索(*foo*
),则可能要使用EdgeNGramFilterFactory
而不是NGramFilterFactory
:它只会生成作为索引标记前缀的ngram.
Additionally, if you are implementing some kind of auto-completion (foo*
), and not in-word search (*foo*
), you may want to use EdgeNGramFilterFactory
instead of NGramFilterFactory
: it will only generate ngrams that are prefixes of the indexed tokens.
这篇关于休眠搜索|带有minGramSize 1的ngram分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!