Solr Dismax处理程序-空格和特殊字符行为 [英] Solr Dismax handler - whitespace and special character behaviour

查看：147 发布时间：2020/5/4 7:38:53 solr lucene tokenize dismax

本文介绍了Solr Dismax处理程序-空格和特殊字符行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当查询中包含特殊字符时，我得到了奇怪的结果.

I've got strange results when I have special characters in my query.

这是我的要求:

q=histoire-france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100%

已解析的查询:

<str name="parsedquery_toString">+((any:histoir any:franc)) ()</str>

我有17000个结果，因为Solr正在执行OR(应该是AND).

I've got 17000 results because Solr is doing an OR (should be AND).

当我使用空格而不是特殊字符时，我没有问题:

I have no problem when I'm using a whitespace instead of a special char :

q=histoire france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100%

<str name="parsedquery_toString">+(((any:histoir) (any:franc))~2) ()</str>

该查询的2000条结果.

2000 results for this query.

这是我的schema.xml(相关部分):

Here is my schema.xml (relevant parts) :

<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CommonGramsFilterFactory" words="stopwords_french.txt" ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_french.txt" enablePositionIncrements="true"/>
        <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CommonGramsFilterFactory" words="stopwords_french.txt" ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_french.txt" enablePositionIncrements="true"/>
        <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
    </fieldType>

我什至尝试使用PatternTokenizerFactory对空白&特殊字符，但没有变化...

I even tried with a PatternTokenizerFactory to tokenize on whitespaces & special chars but no change...

我当前的解决方法是在将查询发送到Solr之前，用空格替换所有特殊字符，但这并不令人满意.

My current workaround is to replace all special chars by whitespaces before sending query to Solr, but it is not satisfying.

即使使用charFilter(PatternReplaceCharFilterFactory)将空白替换为特殊字符，它也不起作用...

EDIT : Even with a charFilter (PatternReplaceCharFilterFactory) to replace special characters by whitespace, it doesn't work...

通过solr admin进行分析的第一行，带有详细的输出，用于查询='histoire-france':

First line of analysis via solr admin, with verbose output, for query = 'histoire-france' :

org.apache.solr.analysis.PatternReplaceCharFilterFactory {replacement= , pattern=([,;./\\'&-]), luceneMatchVersion=LUCENE_32}
text    histoire france

将'-'替换为''，然后由WhitespaceTokenizerFactory标记化.但是，对于"histoire-france"和"histoire France"，我仍然有不同数量的结果.

The '-' is replaced by ' ', then tokenized by WhitespaceTokenizerFactory. However I still have different number of results for 'histoire-france' and 'histoire france'.

我想念什么吗?

Solr Dismax处理程序-空格和特殊字符行为 [英] Solr Dismax handler - whitespace and special character behaviour

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr Dismax处理程序-空格和特殊字符行为 [英] Solr Dismax handler - whitespace and special character behaviour

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭