ElasticSearch按文档字段分组并计数发生次数 [英] ElasticSearch group by documents field and count occurences

查看:1653
本文介绍了ElasticSearch按文档字段分组并计数发生次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的ElasticSearch 6.5.2索引看起来像:

My ElasticSearch 6.5.2 index look likes:

      {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cCYuHW4BvwH6Y3jL87ul",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cSYuHW4BvwH6Y3jL_Lvt",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "eCb6O24BvwH6Y3jLP7tM",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "industry",
    }

我想要一个返回此结果的查询:

And I would like a query that return this result:

"result": 
{
"querySearched" : "telecom",
"number" : 2
},
{
"querySearched" : "industry",
"number" : 1
}

我只想按发生次数分组并获取每个事件的数量,最多只能有十个数字.我尝试使用聚合,但存储桶为空. 谢谢!

I just want to group by occurence and get number of each, limit to ten biggest numbers. I tried with aggregations but bucket is empty. Thanks!

推荐答案

设置您的映射

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

您的查询应类似于

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched",
        "size": 10
      }
    }
  }
}

您应该添加fielddata:true以便为文本类型字段

You should add fielddata:true in order to enable aggregation for text type field more of that

    "size": 10, => limit to 10

与@Kamal简短讨论后,我有义务告诉您,如果您选择启用fielddata:true,则必须知道 它会消耗很多堆空间.

After a short discussion with @Kamal i feel obligated to let you know that if you choose to enable fielddata:true you must know that it can consume a lot of heap space.

通过我分享的链接:

Fielddata会占用大量堆空间,尤其是在加载高基数的文本字段时.一旦将字段数据加载到堆中,它在该段的生命周期内将一直保留在堆中.同样,加载字段数据是一个昂贵的过程,可能导致用户遇到延迟问题.这就是默认情况下禁用字段数据的原因.

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

另一种选择(一种更有效的选择):

Another alternative (a more efficient one):

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
        }
      }
    }
  }
}

然后进行汇总查询

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched.keyword",
        "size": 10
      }
    }
  }
}

两种解决方案都可以,但是您应该使用正在考虑中.

Both solutions works but you should take this under consideration.

希望有帮助

这篇关于ElasticSearch按文档字段分组并计数发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆