Rails sunspot-solr-带连字符的单词 [英] Rails sunspot-solr - words with hyphen

查看：92 发布时间：2020/7/10 3:54:19 ruby-on-rails n-gram sunspot-solr

本文介绍了Rails sunspot-solr-带连字符的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是sunspot_rails宝石，到目前为止一切正常，但是:对于连字符的单词，我没有得到任何搜索结果.

I'm using the sunspot_rails gem and everything is working perfect so far but: I'm not getting any search results for words with a hyphen.

示例: 字符串"tron"返回很多结果(所有文章中提到的词都是e-tron)

Example: The string "tron" returns a lot of results(the word mentioned in all articles is e-tron)

即使这是我所有文章中提到的正确单词，字符串"e-tron"也将返回0个结果.

The string "e-tron" returns 0 results even though this is the correct word mentioned in all my articles.

我当前的schema.xml配置:

My current schema.xml config:

    <fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

我想要的是:搜索字符串tron的行为当然可以，但是我也想对搜索字符串e-tron进行正确的匹配.

What I want: The behaviour for the search string tron is okay of course, but I also want to have the correct matches for the search string e-tron.

推荐答案

问题是solr.StandardTokenizerFactory用连字符分隔单词，因此"e-tron"生成标记"e"，"tron".大概是"e"作为solr.TextField筛选器而丢失，其最小标记大小为2.

The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.

这是一个显示您的特定问题的示例.

This is one example that would show your specific problem.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

solr.WhitespaceTokenizerFactory将在空格上生成令牌. ["e-tron"]
solr.WordDelimiterFilterFactory将在连字符上分开，但还会保留原始单词. ["e", "tron", "e-tron"]

solr.WhitespaceTokenizerFactory will generate tokens on whitespace. ["e-tron"]
solr.WordDelimiterFilterFactory will split on hyphens but also preserve the original word. ["e", "tron", "e-tron"]

这篇关于Rails sunspot-solr-带连字符的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Rails sunspot-solr-带连字符的单词 [英] Rails sunspot-solr - words with hyphen

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Rails sunspot-solr-带连字符的单词 [英] Rails sunspot-solr - words with hyphen

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭