Solr:使用EdgeNGramFilterFactory进行精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

查看：454 发布时间：2020/7/3 18:49:39 solr tokenize phrase

本文介绍了Solr:使用EdgeNGramFilterFactory进行精确短语查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Solr(3.3)中，是否可以通过EdgeNGramFilterFactory逐个字母地搜索字段，并且还对短语查询敏感?

In Solr (3.3), is it possible to make a field letter-by-letter searchable through a EdgeNGramFilterFactory and also sensitive to phrase queries?

通过示例，我正在寻找一个字段，如果包含"contrat informatique"，则该字段将在用户键入以下内容时找到:

By example, I'm looking for a field that, if containing "contrat informatique", will be found if the user types:

对比
信息
contr
信息
冲突信息"
对比信息"

目前，我做了这样的事情:

Currently, I made something like this:

<fieldtype name="terms" class="solr.TextField">
    <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
    </analyzer>
    <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldtype>

...但是在词组查询中失败.

...but it failed on phrase queries.

当我在solr admin中查看模式分析器时，发现"contrat informatique"生成了以下标记:

When I look in the schema analyzer in solr admin, I find that "contrat informatique" generated the followings tokens:

[...] contr contra contrat in inf info infor inform [...]

因此该查询使用"contrat in"(连续标记)，而不使用"contrat inf"(因为这两个标记是分开的).

So the query works with "contrat in" (consecutive tokens), but not "contrat inf" (because this two tokens are separated).

我敢肯定，任何词干都可以与词组查询一起使用，但是我找不到在EdgeNGramFilterFactory之前要使用的正确的过滤器标记器.

I'm pretty sure any kind of stemming can work with phrase queries, but I cannot find the right tokenizer of filter to use before the EdgeNGramFilterFactory.

Solr:使用EdgeNGramFilterFactory进行精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr:使用EdgeNGramFilterFactory进行精确短语查询 [英] Solr: exact phrase query with a EdgeNGramFilterFactory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭