Solr:结合EdgeNGramFilterFactory和NGramFilterFactory [英] Solr: combining EdgeNGramFilterFactory and NGramFilterFactory

查看:69
本文介绍了Solr:结合EdgeNGramFilterFactory和NGramFilterFactory的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到需要同时使用EdgeNGramFilterFactory和NGramFilterFactory的情况.

I have a situation where I need to use both EdgeNGramFilterFactory and NGramFilterFactory.

我正在使用NGramFilterFactory进行包含"样式的搜索,最小字符数为2.我也想搜索第一个字母,例如带有EdgeEdgeramramFilterFactory的"startswith".

I am using NGramFilterFactory to perform a "contains" style search with min number of characters as 2. I also want to search for the first letter, like a "startswith" with a front EdgeNGramFilterFactory.

我不想降低NGramFilterFactory的最小字符数为1,因为我不想索引所有字符.

I dont want to lower the NGramFilterFactory to min characters of 1 as I dont want to index all characters.

我们将不胜感激

欢呼

推荐答案

您不必一定要在同一字段中执行所有操作.我将为每种处理使用不同的自定义类型创建一个不同的字段,以便您可以分别应用逻辑.

You don't necessarily have to do all this in the same field. I would create a different fields using different custom types for each treatment so that you can apply the logic separately.

以下:

  • text包含经过最少处理的原始令牌;
  • text_ngram使用NGramFilter来表示两个字符最少的令牌
  • text_first_letter使用EdgeNGram作为一字符首字母标记
  • text contains the original tokens, minimally processed;
  • text_ngram uses the NGramFilter for your two-character-minimum tokens
  • text_first_letter uses EdgeNGram for your one-character initial-letter tokens

如果您以这种方式处理所有text字段,则可以使用copyField填充字段来摆脱困境.否则,您可以指示Solr客户端为三种不同的字段类型发送相同的字段值.

If you're processing all text fields in this way, then you might be able to get away with using a copyField to populate the fields. Otherwise, you can instruct your Solr client to send in the same field values for the three separate field types.

搜索时,请使用qf参数将所有搜索都包括在内.

When searching, include all of them in your searches with the qf parameter.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>

<fieldType name="text_first_letter" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="1" side="front"/>
  </analyzer>
</fieldType>

设置fielddynamicField定义由您决定.或者让我知道您是否还有其他问题,我可以进行澄清.

Setting up field and dynamicField definitions are left up to you. Or let me know if you have more questions and I can edit with clarifications.

这篇关于Solr:结合EdgeNGramFilterFactory和NGramFilterFactory的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆