为什么模糊查询返回匹配项但模糊查询不在同一输入上? [英] Why fuzzy query returns a match but query with fuzziness doesn't on the same input?
问题描述
我在 Elasticsearch 中创建了以下索引:
PUT/my-index{设置":{分析":{分析器":{我的分析器":{标记器":空白",过滤器":[小写",3_5_edgegrams"]}},过滤器":{3_5_edgegrams":{类型":edge_ngram",min_gram":3,max_gram":10}}}},映射":{属性":{名称":{类型":文本",分析器":my_analyzer";}}}}
然后我插入了以下文档:
{名称":Nuvus Gro Corp"}
当我进行以下查询时(我们称之为 fuzzy_query
):
GET/my-index/_search{查询":{模糊":{名称":{值":qnuv"}}}}
我得到了上述文档的匹配项.
当我进行查询时(我们称之为查询match_with_fuzziness
):
GET/my-index/_search{查询":{匹配":{名称":{查询":qnuv",模糊":自动"}}}}
我没有匹配项.如果我进行以下查询:
GET/my-index/_search{查询":{匹配":{名称":{查询":nuvq",模糊":自动"}}}}
我又得到了一场比赛.我不明白为什么当我进行 match_with_fuzziness
查询时我没有得到任何匹配.
我用 Kibana Profiler 分析了查询,根据分析器 match_with_fuzziness
是一个 SynonymQuery Synonym(name:qnu name:qnuv)
查询而 fuzzy_query
是一个 BoostQuery (name:nuv)^0.6666666
与 您的其他问题.
问题是你没有指定特定的search_analyzer
,所以在搜索时qnuv
和nuvq
也会被my_analyzer
和 edge-ngramed 也是如此,因此您收到的匹配.
如果我们检查第一个查询,由于您使用的是 fuzzy
查询,qnuv
(搜索词)将匹配 nuv
(距离为 1 的第一个索引边 ngramed 标记)(即第一个 q
是可容忍的"),这是 fuzzy
查询在默认情况下所做的(使用模糊性:自动")
在第三个查询中,nuv
(搜索词的第一个边 ngramed 标记)将匹配 nuv
(第一个索引的边 ngramed 标记).>
第二个查询的情况有点特殊,我在下面引用了 fuzziness
参数在 match
查询的上下文中起作用
模糊匹配不适用于具有同义词的术语或分析过程在同一位置产生多个标记的情况.在幕后,这些术语被扩展为一个特殊的同义词查询,它混合了术语频率,不支持模糊扩展.
粗体部分适用于您的情况.由于搜索词 qnuv
是由 my_analyzer
分析的,它会在同一位置产生两个标记 qnu
和 qnuv
并且不支持模糊匹配.
您需要将映射更改为此映射,它会按照您期望的方式工作,即所有三个查询都将返回您的文档:
映射":{属性":{名称":{类型":文本",分析器":my_analyzer",search_analyzer":标准";<---- 添加这一行}}}
I created the following index in Elasticsearch:
PUT /my-index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "3_5_edgegrams"]
}
},
"filter": {
"3_5_edgegrams": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Then I inserted the following document:
{
"name": "Nuvus Gro Corp"
}
When I make the following query (let's call it fuzzy_query
):
GET /my-index/_search
{
"query": {
"fuzzy": {
"name": {
"value": "qnuv"
}
}
}
}
I get a match for the above document.
When I make the query (let's call the query match_with_fuzziness
):
GET /my-index/_search
{
"query": {
"match": {
"name": {
"query": "qnuv",
"fuzziness": "AUTO"
}
}
}
}
I don't get a match. If I make the following query:
GET /my-index/_search
{
"query": {
"match": {
"name": {
"query": "nuvq",
"fuzziness": "AUTO"
}
}
}
}
I again get a match. I don't understand why when I make the match_with_fuzziness
query I don't get any matches.
EDIT: I analyzed the queries with Kibana Profiler and according to the profiler match_with_fuzziness
is a SynonymQuery Synonym(name:qnu name:qnuv)
query while fuzzy_query
is a BoostQuery (name:nuv)^0.6666666
Very similar problem to the one explained in your other question.
The problem is that you haven't specified a specific search_analyzer
, so at search time qnuv
and nuvq
also get analyzed by my_analyzer
and edge-ngramed as well, hence the match you're receiving.
If we check the first query, since you're using the fuzzy
query, qnuv
(the search term) will match nuv
(the first indexed edge-ngramed token) with a distance of 1 (i.e. the first q
is "tolerated"), which is what the fuzzy
query does by default (with "fuzziness: AUTO")
In the third query, nuv
(the first edge-ngramed token of the search term) will match nuv
(the first indexed edge-ngramed token).
The case of the second query is a bit special and I'm referencing below how the fuzziness
parameter works in the context of match
queries
Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
The part in bold is what applies to your case. Since the search term qnuv
is analyzed by my_analyzer
, it produces the two tokens qnu
and qnuv
at the same position and that does not support fuzzy matching.
You need to change your mapping to this one instead and it will work the way you expect, i.e. all three queries will return your document:
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard" <---- add this line
}
}
}
这篇关于为什么模糊查询返回匹配项但模糊查询不在同一输入上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!