“匹配"与“应该"一起查询子句在 Elasticsearch 中给出超过要求的匹配结果 [英] "match" query along with "should" clause giving more than required match results in Elasticsearch

查看:25
本文介绍了“匹配"与“应该"一起查询子句在 Elasticsearch 中给出超过要求的匹配结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 elasticsearch 中编写了以下 lucene 查询,用于获取包含 Id 字段的文档:

I have written the following lucene query in elasticsearch for getting documents with Id field as mentioned:

GET requirements_v3/_search
  {
   "from": 0, 
   "size": 10, 
   "query": {
   "bool": {
  "filter": {
    "bool": {
      "should": [
    {"match": {
      "Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
    }},
    {
      "match": {
      "Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
      }
    },
    {
      "match": {
      "Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa"
      }
    },
    {
      "match": {
      "Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703"
      }
    },
    {
      "match": {
      "Id": "8c399993-f273-4ee0-a1ab-3a85c6848113"
      }
    },
    {
      "match": {
      "Id": "4461eb37-487e-4899-a7be-914640fab0e0"
      }
    },
    {
      "match": {
      "Id": "07052261-b904-4bfc-a6fd-3acd28114c6a"
      }
    },
    {
      "match": {
      "Id": "95816ff0-9eae-4196-99fc-86c6f43395fd"
      }
    },
    {
      "match": {
      "Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a"
      }
    },
    {
      "match": {
      "Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f"
      }
    }
  ]
 }
 }
 }     
}

上述查询的响应是:

 {
  "took": 14,
  "timed_out": false,
  "_shards": {
  "total": 5,
  "successful": 5,
  "skipped": 0,
"failed": 0
},
"hits": {
"total": 18,
"max_score": 0,
"hits": [
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
    "_score": 0,
    "_source": {
      "Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "4461eb37-487e-4899-a7be-914640fab0e0",
    "_score": 0,
    "_source": {
      "Id": "4461eb37-487e-4899-a7be-914640fab0e0",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
    "_score": 0,
    "_source": {
      "Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
      "Name": "Create Configurator"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
    "_score": 0,
    "_source": {
      "Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
      "Name": "Create Configurator"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
    "_score": 0,
    "_source": {
      "Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
      "Name": "Detail Page - Build"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
    "_score": 0,
    "_source": {
      "Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
    "_score": 0,
    "_source": {
      "Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
      "Name": "HUC"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
    "_score": 0,
    "_source": {
      "Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
      "Name": "DAMS UpsertUnenrollPrice"        }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
    "_score": 0,
    "_source": {
      "Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
      "Name": "Item Setup"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
    "_score": 0,
    "_source": {
      "Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
      "Name": "Installments"        
   }
  }
 ]
}
}

这里提到 totalHits 为18".为什么它返回的项目多于 10 个?我相信匹配查询应该用于精确"匹配,那么为什么这里返回更多文档?

This mentions totalHits as '18'. Why is it returning more items than 10? I believe match query should be used for 'exact' matches, so why more documents are returned here?

P.S.:我知道我可以为此使用 Ids 查询,但我想知道为什么这没有返回正确的响应

P.S.: I know I can use the Ids query for this, but I want to know why is this not returning the correct response

更新:将大小设置为 20 会返回以下响应:

Update: Setting the size to 20 returns the following response:

 {
  "took": 195,
  "timed_out": false,
  "_shards": {
  "total": 5,
 "successful": 5,
 "skipped": 0,
"failed": 0
},
"hits": {
 "total": 18,
 "max_score": 0,
 "hits": [
   {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
    "_score": 0,
    "_source": {
      "Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "4461eb37-487e-4899-a7be-914640fab0e0",
    "_score": 0,
    "_source": {
      "Id": "4461eb37-487e-4899-a7be-914640fab0e0",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
    "_score": 0,
    "_source": {
      "Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
      "Name": "Create Configurator"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
    "_score": 0,
    "_source": {
      "Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
      "Name": "Create Configurator"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
    "_score": 0,
    "_source": {
      "Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
      "Name": "Detail Page - Build"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
    "_score": 0,
    "_source": {
      "Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
    "_score": 0,
    "_source": {
      "Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
      "Name": "HUC"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
    "_score": 0,
    "_source": {
      "Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
      "Name": "DAMS UpsertUnenrollPrice"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
    "_score": 0,
    "_source": {
      "Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
      "Name": "Item Setup"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
    "_score": 0,
    "_source": {
      "Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
      "Name": "Installments"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
    "_score": 0,
    "_source": {
      "Id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
      "Name": "Detail Page - Strings "
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
    "_score": 0,
    "_source": {
      "Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
    "_score": 0,
    "_source": {
      "Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
      "Name": "Create Configurator"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
    "_score": 0,
    "_source": {
      "Id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
      "Name": "Invite Platform"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
    "_score": 0,
    "_source": {
      "Id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
      "Name": "Training"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
    "_score": 0,
    "_source": {
      "Id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
      "Name": "Configure ASIN for Reporting"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
    "_score": 0,
    "_source": {
      "Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
      "Name": "Create Extended/Limited Warranty Configuration"
    }
  },
  {
    "_index": "requirements_v3",
    "_type": "_doc",
    "_id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
    "_score": 0,
    "_source": {
      "Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
      "Name": "Invite Platform"
     }
    }
  ]
 }
}

推荐答案

让我们通过以下映射来理解这一点,例如:

Lets understand this by the following mapping e.g:

{
  "_doc": {
    "properties": {
      "Id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

上面的映射是由elasticsearch动态创建的.现在让我们关注 Id 字段.它的类型是text.默认情况下,text 数据类型的 analyzerstandard 分析器.当此分析器应用于此字段的输入时,它会被标记为术语.例如,如果您为 Id 输入的值是 33f87d98-024f-4893-aa1c-8d438a98cd1f,则会生成以下标记:

The above mapping is created dynamically by elasticsearch. Lets us now focus on Id field. Its type is text. By default the analyzer for text datatype is standard analyzer. When this analyzer is applied on the input for this field it get tokenized into terms. So for example if you input value for Id is 33f87d98-024f-4893-aa1c-8d438a98cd1f following tokens get generated:

33f87d98
024f
4893
aa1c
8d438a98cd1f

如您所见,输入值被用作分隔符的 - 分割.这是因为 标准分析器 是应用它.

As you can see the input value is splitted by - being used as delimiter. This is because standard analyzer is applied on it.

Id 下还有一个子字段是keyword,它的类型是keyword.对于 keyword 类型,输入按原样编入索引,无需进行任何修改.

There is another sub-field under Id which is keyword and its type is keyword. For type keyword the input is indexed as it is without applying any modification.

现在让我们了解为什么更多的文档被匹配并且结果计数超过预期.在您的查询中,您在 Id 字段上使用了 match 查询,如下所示:

Now lets understand why more documents get matched and result count is more than expected. In your query you used match query on Id field as below:

{
  "match": {
    "Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
  }
}

默认情况下,匹配查询使用应用于映射字段的相同分析器.因此,再次在查询中的 Id 值上应用相同的分析器,并将输入以与上述类似的方式拆分为标记.在匹配查询输入字符串的标记之间应用的默认运算符是 OR,因此您的查询实际上变为:

By default match query uses the same analyzer that is applied on the field in mapping. So on the Id value in the query again the same analyzer is applied and the input is splitted into tokens in a similar way as above. The default operator that is applied between tokens of match query input string is OR and hence your query actually becomes:

b8bf49a4 OR 960b OR 4fa8 OR 8c5f OR a3fce4b4d07b

如果上述任何标记与存储在 Id 字段中的任何索引词匹配,则该文档被视为匹配.

There if any of the above tokens match to any of the indexed terms stored in Id field, the document is considered a match.

以上映射的解决方案:

改用关键字字段.所以查询变成:

Use the keyword field instead. So the query becomes:

{
  "match": {
    "Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
  }
}

更多关于匹配如何工作见这里.

More on how match works see here.

正如@Curious_MInd 在他的回答中提到的那样,使用 terms 比在 should 中使用多个 match 更好.

Also as mention by @Curious_MInd in his answer its better to use terms than using multiple match in should.

这篇关于“匹配"与“应该"一起查询子句在 Elasticsearch 中给出超过要求的匹配结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆