Rails sunspot-solr-带连字符的单词 [英] Rails sunspot-solr - words with hyphen
问题描述
我使用的是sunspot_rails宝石,到目前为止一切正常,但是:对于连字符的单词,我没有得到任何搜索结果.
I'm using the sunspot_rails gem and everything is working perfect so far but: I'm not getting any search results for words with a hyphen.
示例: 字符串"tron"返回很多结果(所有文章中提到的词都是e-tron)
Example: The string "tron" returns a lot of results(the word mentioned in all articles is e-tron)
即使这是我所有文章中提到的正确单词,字符串"e-tron"也将返回0个结果.
The string "e-tron" returns 0 results even though this is the correct word mentioned in all my articles.
我当前的schema.xml配置:
My current schema.xml config:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
我想要的是:搜索字符串tron的行为当然可以,但是我也想对搜索字符串e-tron进行正确的匹配.
What I want: The behaviour for the search string tron is okay of course, but I also want to have the correct matches for the search string e-tron.
推荐答案
问题是solr.StandardTokenizerFactory用连字符分隔单词,因此"e-tron"生成标记"e","tron".大概是"e"作为solr.TextField筛选器而丢失,其最小标记大小为2.
The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.
这是一个显示您的特定问题的示例.
This is one example that would show your specific problem.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
-
solr.WhitespaceTokenizerFactory
将在空格上生成令牌.["e-tron"]
-
solr.WordDelimiterFilterFactory
将在连字符上分开,但还会保留原始单词.["e", "tron", "e-tron"]
solr.WhitespaceTokenizerFactory
will generate tokens on whitespace.["e-tron"]
solr.WordDelimiterFilterFactory
will split on hyphens but also preserve the original word.["e", "tron", "e-tron"]
这篇关于Rails sunspot-solr-带连字符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!