Solr和Hibernate Search的多字同义词 [英] Multiword synonyms with Solr and Hibernate Search
问题描述
我有一个同义词.txt文件,其内容如下:
I have a synonyms.txt file with content as below
car accessories, gadi marmat
并且我正在为汽车配件编制索引,以便将其扩展到汽车配件和加迪马马特.
and I am indexing car accessories as a single token so that it will expand to car accessories and gadi marmat.
我希望整个同义词匹配,以便在查询 gadi marmat 时,返回带有汽车配件的记录.
i want the whole synonyms to match so that when query for gadi marmat, the record with car accessories to be returned.
我正在使用带状滤波器工厂来扩展查询,以便在搜索 gadi marmat 时将其扩展为 gadi , gadi marmat 和 marmat ,并且由于 gadi marmat 是作为单个令牌查询的,因此它应该与汽车配件相匹配并返回结果,但事实并非如此,但是当我搜索汽车配件时,它正在返回结果.因此,必须对带有多个单词的同义词进行索引的prblm.
I am using shingle filter factory to expand query so that when searching for gadi marmat, it will be expanded to gadi, gadi marmat and marmat, and since gadi marmat is queried as a single token, it should have matched car accessories and returned result but this is not the case, but when i search for car accessories, it is returning result. So must be prblm with indexing synonyms that have multiple words.
请提出建议.
推荐答案
同义词文件仅用于更改您要搜索的单词.所以如果你写
synonym file is use only to change a word that are you searching. so if you write
汽车配件=> gadi marmat
car accessories => gadi marmat
当编译器匹配汽车配件"时,它将尝试匹配"gadi marmat"
when a compiler matching on "car accessories", it try to matching on "gadi marmat"
它像单个令牌一样工作
混合这样的分析器元素可以获得良好的结果
you can get good results mixing analyzer elements like that
@AnalyzerDef(name = "integram",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "lucene/dictionary/stopwords.txt"),
@Parameter(name = "ignoreCase", value = "true"),
@Parameter(name = "enablePositionIncrements", value = "true")
}),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
@Parameter(name = "language", value = "English")
}),
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
@Parameter(name = "synonyms", value = "lucene/dictionary/synonyms.txt"),
@Parameter(name = "expand", value = "false")
}),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
@Parameter(name = "language", value = "English")
})
})
这篇关于Solr和Hibernate Search的多字同义词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!