ElasticSearch计算按字段分组的多个字段 [英] ElasticSearch count multiple fields grouped by

查看:120
本文介绍了ElasticSearch计算按字段分组的多个字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类似的文件

  {"domain":"US","zipcode":"11111","eventType":"click","id":"1","time":100}{域":"US",邮政编码":"22222","eventType":出售","id":"2",时间":200}{域":美国",邮政编码":"22222",事件类型":点击","id":"3",时间":150}{域":美国",邮政编码":"11111",事件类型":出售","id":"4",时间":350}{域":美国",邮政编码":"33333",事件类型":出售","id":"5",时间":225}{域":"EU",邮政编码":"44444",事件类型":点击","id":"5",时间":120} 

我想通过eventType = sell过滤这些文档,并在125到400之间过滤时间,按域名分组,后按邮政编码,并对每个存储分区中的文档进行计数.所以我的输出就像(过滤器会忽略第一个文档和最后一个文档)

美国11111,1

美国,22222,1

美国,33333,1

在SQL中,这应该很简单.但是我无法在ElasticSearch上使用它.有人可以帮我吗?

如何编写ElasticSearch查询以完成上述任务?

解决方案

此查询似乎可以满足您的要求:

  POST/test_index/_search{大小":0,询问": {已过滤":{筛选": {布尔":{必须": [{学期": {"eventType":出售"}},{范围": {时间": {"gte":125,"lte":400}}}]}}}},"aggs":{"zipcode_terms":{条款":{"field":邮政编码"}}}} 

返回

  {接":8"timed_out":否,"_shards":{总计":5成功":5失败":0},点击数":{总计":3,"max_score":0,点击数":[]},集合":{"zipcode_terms":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,存储桶":[{"key":"11111","doc_count":1},{"key":"22222","doc_count":1},{"key":"33333","doc_count":1}]}}} 

(请注意,"22222"处只有1个卖出",而不是2个.)

这是我用来测试的一些代码:

http://sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b

您可能想看看术语集合范围过滤器.

我刚刚意识到我省略了域部分,但是如果需要的话,也可以直接在其上添加存储桶聚合.

I have documents like

{"domain":"US", "zipcode":"11111", "eventType":"click", "id":"1", "time":100}

{"domain":"US", "zipcode":"22222", "eventType":"sell", "id":"2", "time":200}

{"domain":"US", "zipcode":"22222", "eventType":"click", "id":"3","time":150}

{"domain":"US", "zipcode":"11111", "eventType":"sell", "id":"4","time":350}

{"domain":"US", "zipcode":"33333", "eventType":"sell", "id":"5","time":225}

{"domain":"EU", "zipcode":"44444", "eventType":"click", "id":"5","time":120}

I want to filter these documents by eventType=sell and time between 125 and 400, group by domain followed by zipcode and count the documents in each bucket. So my output would be like (first and last docs would be ignored by the filters)

US, 11111,1

US, 22222,1

US, 33333,1

In SQL, this should have been straightforward. But I am not able to get this to work on ElasticSearch. Could someone please help me out here?

How do I write ElasticSearch query to accomplish the above?

解决方案

This query seems to do what you want:

POST /test_index/_search
{
   "size": 0,
   "query": {
      "filtered": {
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "eventType": "sell"
                     }
                  },
                  {
                     "range": {
                        "time": {
                           "gte": 125,
                           "lte": 400
                        }
                     }
                  }
               ]
            }
         }
      }
   },
   "aggs": {
      "zipcode_terms": {
         "terms": {
            "field": "zipcode"
         }
      }
   }
}

returning

{
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "zipcode_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "11111",
               "doc_count": 1
            },
            {
               "key": "22222",
               "doc_count": 1
            },
            {
               "key": "33333",
               "doc_count": 1
            }
         ]
      }
   }
}

(Note that there is only 1 "sell" at "22222", not 2).

Here is some code I used to test it:

http://sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b

You might want to take a look at terms aggregations, the bool filter, and range filters.

EDIT: I just realized I left out the domain part, but it should be straightforward to add in a bucket aggregation on that as well if you need to.

这篇关于ElasticSearch计算按字段分组的多个字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆