为什么模糊查询返回匹配项但模糊查询不在同一输入上? [英] Why fuzzy query returns a match but query with fuzziness doesn't on the same input?

查看:33
本文介绍了为什么模糊查询返回匹配项但模糊查询不在同一输入上?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Elasticsearch 中创建了以下索引:

PUT/my-index{设置":{分析":{分析器":{我的分析器":{标记器":空白",过滤器":[小写",3_5_edgegrams"]}},过滤器":{3_5_edgegrams":{类型":edge_ngram",min_gram":3,max_gram":10}}}},映射":{属性":{名称":{类型":文本",分析器":my_analyzer";}}}}

然后我插入了以下文档:

{名称":Nuvus Gro Corp"}

当我进行以下查询时(我们称之为 fuzzy_query):

GET/my-index/_search{查询":{模糊":{名称":{值":qnuv"}}}}

我得到了上述文档的匹配项.

当我进行查询时(我们称之为查询match_with_fuzziness):

GET/my-index/_search{查询":{匹配":{名称":{查询":qnuv",模糊":自动"}}}}

我没有匹配项.如果我进行以下查询:

GET/my-index/_search{查询":{匹配":{名称":{查询":nuvq",模糊":自动"}}}}

我又得到了一场比赛.我不明白为什么当我进行 match_with_fuzziness 查询时我没有得到任何匹配.

我用 Kibana Profiler 分析了查询,根据分析器 match_with_fuzziness 是一个 SynonymQuery Synonym(name:qnu name:qnuv) 查询而 fuzzy_query 是一个 BoostQuery (name:nuv)^0.6666666

解决方案

您的其他问题.

问题是你没有指定特定的search_analyzer,所以在搜索时qnuvnuvq也会被my_analyzer 和 edge-ngramed 也是如此,因此您收到的匹配.

如果我们检查第一个查询,由于您使用的是 fuzzy 查询,qnuv(搜索词)将匹配 nuv(距离为 1 的第一个索引边 ngramed 标记)(即第一个 q 是可容忍的"),这是 fuzzy 查询在默认情况下所做的(使用模糊性:自动")

在第三个查询中,nuv(搜索词的第一个边 ngramed 标记)将匹配 nuv(第一个索引的边 ngramed 标记).

第二个查询的情况有点特殊,我在下面引用了 fuzziness 参数在 match 查询的上下文中起作用

<块引用>

模糊匹配不适用于具有同义词的术语或分析过程在同一位置产生多个标记的情况.在幕后,这些术语被扩展为一个特殊的同义词查询,它混合了术语频率,不支持模糊扩展.

粗体部分适用于您的情况.由于搜索词 qnuv 是由 my_analyzer 分析的,它会在同一位置产生两个标记 qnuqnuv并且不支持模糊匹配.

您需要将映射更改为此映射,它会按照您期望的方式工作,即所有三个查询都将返回您的文档:

 映射":{属性":{名称":{类型":文本",分析器":my_analyzer",search_analyzer":标准";<---- 添加这一行}}}

I created the following index in Elasticsearch:

PUT /my-index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": ["lowercase", "3_5_edgegrams"]
        }
      },
      "filter": {
        "3_5_edgegrams": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 10
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Then I inserted the following document:

{
  "name": "Nuvus Gro Corp"
}

When I make the following query (let's call it fuzzy_query):

GET /my-index/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "qnuv"
      }
    }
  }
}

I get a match for the above document.

When I make the query (let's call the query match_with_fuzziness):

GET /my-index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "qnuv",
        "fuzziness": "AUTO"
      }
    }
  }
}

I don't get a match. If I make the following query:

GET /my-index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "nuvq",
        "fuzziness": "AUTO"
      }
    }
  }
}

I again get a match. I don't understand why when I make the match_with_fuzziness query I don't get any matches.

EDIT: I analyzed the queries with Kibana Profiler and according to the profiler match_with_fuzziness is a SynonymQuery Synonym(name:qnu name:qnuv) query while fuzzy_query is a BoostQuery (name:nuv)^0.6666666

解决方案

Very similar problem to the one explained in your other question.

The problem is that you haven't specified a specific search_analyzer, so at search time qnuv and nuvq also get analyzed by my_analyzer and edge-ngramed as well, hence the match you're receiving.

If we check the first query, since you're using the fuzzy query, qnuv (the search term) will match nuv (the first indexed edge-ngramed token) with a distance of 1 (i.e. the first q is "tolerated"), which is what the fuzzy query does by default (with "fuzziness: AUTO")

In the third query, nuv (the first edge-ngramed token of the search term) will match nuv (the first indexed edge-ngramed token).

The case of the second query is a bit special and I'm referencing below how the fuzziness parameter works in the context of match queries

Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.

The part in bold is what applies to your case. Since the search term qnuv is analyzed by my_analyzer, it produces the two tokens qnu and qnuv at the same position and that does not support fuzzy matching.

You need to change your mapping to this one instead and it will work the way you expect, i.e. all three queries will return your document:

  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "standard"          <---- add this line
      }
    }
  }

这篇关于为什么模糊查询返回匹配项但模糊查询不在同一输入上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆