Rails sunspot-solr-带连字符的单词 [英] Rails sunspot-solr - words with hyphen

查看:92
本文介绍了Rails sunspot-solr-带连字符的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是sunspot_rails宝石,到目前为止一切正常,但是:对于连字符的单词,我没有得到任何搜索结果.

I'm using the sunspot_rails gem and everything is working perfect so far but: I'm not getting any search results for words with a hyphen.

示例: 字符串"tron"返回很多结果(所有文章中提到的词都是e-tron)

Example: The string "tron" returns a lot of results(the word mentioned in all articles is e-tron)

即使这是我所有文章中提到的正确单词,字符串"e-tron"也将返回0个结果.

The string "e-tron" returns 0 results even though this is the correct word mentioned in all my articles.

我当前的schema.xml配置:

My current schema.xml config:

    <fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

我想要的是:搜索字符串tron的行为当然可以,但是我也想对搜索字符串e-tron进行正确的匹配.

What I want: The behaviour for the search string tron is okay of course, but I also want to have the correct matches for the search string e-tron.

推荐答案

问题是solr.StandardTokenizerFactory用连字符分隔单词,因此"e-tron"生成标记"e","tron".大概是"e"作为solr.TextField筛选器而丢失,其最小标记大小为2.

The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.

这是一个显示您的特定问题的示例.

This is one example that would show your specific problem.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

  1. solr.WhitespaceTokenizerFactory将在空格上生成令牌. ["e-tron"]
  2. solr.WordDelimiterFilterFactory将在连字符上分开,但还会保留原始单词. ["e", "tron", "e-tron"]
  1. solr.WhitespaceTokenizerFactory will generate tokens on whitespace. ["e-tron"]
  2. solr.WordDelimiterFilterFactory will split on hyphens but also preserve the original word. ["e", "tron", "e-tron"]

这篇关于Rails sunspot-solr-带连字符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆