Solr:使用 EdgeNGramFilterFactory 的精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

查看：25 发布时间：2021/12/30 8:58:34 solr tokenize phrase

本文介绍了Solr:使用 EdgeNGramFilterFactory 的精确短语查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Solr (3.3) 中，是否可以通过 EdgeNGramFilterFactory 逐个字母搜索字段并且对短语查询敏感?

In Solr (3.3), is it possible to make a field letter-by-letter searchable through a EdgeNGramFilterFactory and also sensitive to phrase queries?

例如，我正在寻找一个字段，如果包含contrat informatique"，则会在用户键入时找到该字段:

By example, I'm looking for a field that, if containing "contrat informatique", will be found if the user types:

对比
信息
控制
信息
contrat informatique"
合同信息"

目前，我做了这样的事情:

Currently, I made something like this:

<fieldtype name="terms" class="solr.TextField">
    <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
    </analyzer>
    <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldtype>

...但它在短语查询上失败了.

...but it failed on phrase queries.

当我查看 solr admin 中的模式分析器时，我发现contrat informatique"生成了以下标记:

When I look in the schema analyzer in solr admin, I find that "contrat informatique" generated the followings tokens:

[...] contr contra contrat in inf info infor inform [...]

因此查询适用于contrat in"(连续标记)，而不适用于contrat inf"(因为这两个标记是分开的).

So the query works with "contrat in" (consecutive tokens), but not "contrat inf" (because this two tokens are separated).

我很确定任何类型的词干提取都可以用于短语查询，但是我在 EdgeNGramFilterFactory 之前找不到要使用的正确的过滤器标记器.

I'm pretty sure any kind of stemming can work with phrase queries, but I cannot find the right tokenizer of filter to use before the EdgeNGramFilterFactory.

Solr:使用 EdgeNGramFilterFactory 的精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr:使用 EdgeNGramFilterFactory 的精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭