solr不标记受保护的单词 [英] solr not tokenizing protected words

查看：96 发布时间：2020/5/4 7:40:51 solr lucene tokenize protected words

本文介绍了solr不标记受保护的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Solr/Lucene(3.x)中有一个带有特殊复制字段facet_headline的文档，以便有一个没有梗的字段来刻面.

I have documents in Solr/Lucene (3.x) with a special copy field facet_headline in order to have an unstemmed field for faceting.

有时2个或更多的单词属于一个单词，应将其视为一个单词，例如"kim jong il".

Sometimes 2 ore more words are belong together, and this should be handled/counted as one word, for example "kim jong il".

因此标题星期六:金正日死了"应该分为:

So the headline "Saturday: kim jong il had died" should be split into:

Saturday kim jong il had died

由于这个原因，我决定使用受保护的单词(protwords)，并在其中添加kim jong il. schema.xml看起来像这样.

For this reason I decided to use protected words (protwords), where I add kim jong il. The schema.xml looks like this.

   <fieldType name="facet_headline" class="solr.TextField" omitNorms="true">
        <analyzer>
           <tokenizer class="solr.PatternTokenizerFactory" pattern="\?|\!|\.|\:|\;|\,|\&quot;|\(|\)|\\|\+|\*|&lt;|&gt;|([0-31]+\.)" />
           <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" 
                   protected="protwords.txt" />
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.TrimFilterFactory"/>
           <filter class="solr.StopFilterFactory"
           ignoreCase="true"
           words="stopwords.txt"
           enablePositionIncrements="true"
           />
        </analyzer>
   </fieldType>

使用solr分析似乎不起作用！该字符串仍分为6个字.看起来好像没有使用protword.txt，但是如果标题仅包含名称:kim jong il一切正常，则不会拆分这些术语.

Using the solr analysis it looks like that doesn't work! The string is still split into 6 words. It looks like the protword.txt is not used, but if the headline ONLY contains the name: kim jong il everything works fine, the terms aren't split.

有没有一种方法可以达到我的目标:不拆分特定的单词/单词组?

Is there a way to reach my goal: not to split specific words/word groups?

solr不标记受保护的单词 [英] solr not tokenizing protected words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

solr不标记受保护的单词 [英] solr not tokenizing protected words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭