“先过滤,然后汇总";还是仅仅是“过滤器汇总"? [英] "Filter then Aggregation" or just "Filter Aggregation"?

查看:57
本文介绍了“先过滤,然后汇总";还是仅仅是“过滤器汇总"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在研究ES,发现可以达到几乎相同的结果,但是对于差异我没有 clear 的想法>在这两个之间.

I am working on ES recently and I found that I could achieve the almost same result but I have no clear idea as to the DIFFERENCE between these two.

先过滤后进行汇总"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      }
    }
  },
  "aggs": {
    "ca_weathers": {
      "terms": { "field": "DestWeather" }
    }
  }
}

过滤器聚合"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "ca": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      },
      "aggs": {
        "_weathers": {
           "terms": { "field": "DestWeather" } 
        }
      }
    }
  }
}

我的问题

  1. 为什么有两个相似的功能?我相信我错了,但是有什么区别呢?(请不要理会结果格式,这不是我要问的问题; p)
  2. 如果我想过滤出不相关/不匹配的内容并开始对许多文档进行汇总,哪个会更好?

推荐答案

@Val的评论,我可能在这里引用以供参考:

Answer from @Val's comment, I may just quote here for reference:

在选项A中,汇总将在所有文档上运行.在选项B中,首先对文档进行过滤,并且汇总将仅在所选文档上运行.假设您有1000万个文档,而过滤器只选择了100个,那么很明显,选项B总是会更快.

In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster.

这篇关于“先过滤,然后汇总";还是仅仅是“过滤器汇总"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆