使用min_doc_count = 0的弹性搜索聚合返回与查询结果或命中无关的所有存储桶 [英] Elastic search aggregation using min_doc_count=0 returns all the buckets which are not related to query results or hits

查看:131
本文介绍了使用min_doc_count = 0的弹性搜索聚合返回与查询结果或命中无关的所有存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的查询-

  {来自":0,大小":100,查询":{布尔":{过滤器":[{条款":{" folderId.keyword":["ff98505e-cdff-43aa-8b05-197bc3f3265e"],提升":1}},{条款":{" objectType.keyword":[文件"],提升":1}},{条款":{"tenantId":{值":"34202",提升":1}}}],"adjust_pure_negative":是的,提升":1}},"aggs":{"_byformat":{条款":{字段":"format.keyword"," min_doc_count":0:尺寸":200}}}} 

min_doc_count = 0的结果-

获取":1"timed_out";: 错误的,"_shards":{总计":1成功":1跳过":0,失败":0},点击数":{总计":3"max_score";:0.0,点击数":[{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"6adbda83-53ad-457f-a2ab-d5b04c643005"," _score":0.0,"_source":{格式":"vnd.openxmlformats-officedocument.spreadsheetml.sheet",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLSX文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"6adbda83-53ad-457f-a2ab-d5b04c643005",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6"," _score":0.0,"_source":{格式":"vnd.ms-excel",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLS文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"630e9f49-3368-408d-a091-03f253127004"," _score":0.0,"_source":{格式":"msword",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新DOC文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"630e9f49-3368-408d-a091-03f253127004",状态":有效","expirationDate": 空值}}]},集合"是指:{"_byformat":{" doc_count_error_upper_bound":0," sum_other_doc_count":0,存储桶":[{键":"msword"," doc_count":1},{键":"vnd.ms-excel"," doc_count":1},{键":"vnd.openxmlformats-officedocument.spreadsheetml.sheet"," doc_count":1},{键":"bmp"," doc_count":0},{键":"gif"," doc_count":0},{键":"html"," doc_count":0}]}}}

min_doc_count = 1的结果-

  {接受":0,"timed_out";: 错误的,"_shards":{总计":1成功":1跳过":0,失败":0},点击数":{总计":3"max_score";:0.0,点击数":[{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"6adbda83-53ad-457f-a2ab-d5b04c643005"," _score":0.0,"_source":{格式":"vnd.openxmlformats-officedocument.spreadsheetml.sheet",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLSX文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"6adbda83-53ad-457f-a2ab-d5b04c643005",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6"," _score":0.0,"_source":{格式":"vnd.ms-excel",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLS文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"630e9f49-3368-408d-a091-03f253127004"," _score":0.0,"_source":{格式":"msword",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新DOC文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"630e9f49-3368-408d-a091-03f253127004",状态":有效","expirationDate": 空值}}]},集合"是指:{"_byformat":{" doc_count_error_upper_bound":0," sum_other_doc_count":0,存储桶":[{键":"msword"," doc_count":1},{键":"vnd.ms-excel"," doc_count":1},{键":"vnd.openxmlformats-officedocument.spreadsheetml.sheet"," doc_count":1}]}}} 

当min_doc_count = 1时,聚合是正确的,并且仅提取与匹配项相关的存储桶.

谁能告诉我为什么在设置min_doc_count = 0时聚合会提取所有存储桶.我浏览了弹性搜索文档,它指出此行为是设计使然,也可以通过其他任何方式仅针对匹配且计数为零的方式获取聚合存储桶.

解决方案

首先,您需要了解看到 计数为零的桶 的含义./p>

以下摘录自

请注意,我已经使用了条款 解决方案

First of all you need to understand what is the meaning of seeing buckets with zero counts.

Below is an excerpt from the Terms Aggregation link:

Setting min_doc_count=0 will also return buckets for terms that didn’t match any hit. However, some of the returned terms which have a document count of zero might only belong to deleted documents or documents from other types, so there is no warranty that a match_all query would find a positive document count for those terms.

So most likely it appears to be the count for the deleted documents.

Note that the aggregation would only get calculated on the documents that get filtered by the query.

However you need to keep in mind that while ES keeps merging the segments of indexes behind the scenes(that happens during deletion process), the results for count with 0 may not be consistent and over a period of time may, eventually (if no further dos are deleted from that point onwards) you may not get any terms with 0 count at all once the merging process is completed.

So in a way it is safe to say to your business leads, that they are the counts for deleted docs and you can push the above argument to them. And if they say they need count of docs/terms of deleted docs, it is like finding a document/terms which does not exist in the index and it does not even make sense right.

As per why does this still shows, that is probably due to the segment merging process that happens in ES and it is by design.

So no, you cannot apply query/filter on deleted documents (take a step back and imagine that) and hence you cannot control the data related to docs not available in first place.

Aggregation Query:

You can make use of the below aggregation which would give you as per your requirement mentioned in the comment:

POST <your_index_name>/_search
{
  "size": 0,
  "aggs": {
    "myaggs_count_zero": {                       <--- Agg for count 0
      "terms": {
        "field": "format.keyword"
      },
      "aggs": {
        "document_counts": {
          "value_count" : {
            "field" : "format.keyword"
          }
        },
        "by_account_filtered": {
          "bucket_selector": {
            "buckets_path": {
              "totalDocs": "document_counts"
            },
            "script": "params.totalDocs == 0"
          }
        }
      }
    },
    "myaggs_count_not_zero": {                  <--- Agg for normal count
      "terms": {
        "field": "format.keyword",
        "min_doc_count": 1
      }
    }
  }
}

Note that I've made use of Terms, Value Count and Bucket Selector Aggregations

This may not be what you are looking for but I hope that helps!

这篇关于使用min_doc_count = 0的弹性搜索聚合返回与查询结果或命中无关的所有存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆