ElasticSearch + Kibana - 使用预先计算的散列值的唯一计数 [英] ElasticSearch + Kibana - Unique count using pre-computed hashes

查看:676
本文介绍了ElasticSearch + Kibana - 使用预先计算的散列值的唯一计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:添加

我想在我的ElasticSearch群集上执行唯一的计数。
集群包含大约5千万条记录。

I want to perform unique count on my ElasticSearch cluster. The cluster contains about 50 millions of records.

我尝试了以下方法:

本节


预计算哈希通常是仅在非常大和/或高基数字段上有用,因为它可节省CPU和内存。

Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory.



第二种方法



这里提到部分


除非您将弹性搜索配置为使用doc_values作为字段数据格式,否则使用聚合和构面是

Unless you configure Elasticsearch to use doc_values as the field data format, the use of aggregations and facets is very demanding on heap space.



我的属性映射



My property mapping

"my_prop": {
  "index": "not_analyzed",
  "fielddata": {
    "format": "doc_values"
  },
  "doc_values": true,
  "type": "string",
  "fields": {
    "hash": {
      "type": "murmur3"
    }
  }
}



问题



当我在Kibana中的my_prop.hash上使用唯一的计数时,我收到以下错误:

The problem

When I use unique count on my_prop.hash in Kibana I receive the following error:

Data too large, data for [my_prop.hash] would be larger than limit

ElasticSearch有2g堆大小。
以上对于具有400万条记录的单个索引也无效。

ElasticSearch has 2g heap size. The above also fails for a single index with 4 millions of records.


  1. 我在配置中缺少某些东西?

  2. 我应该增加机器吗?这似乎不是可扩展的解决方案。



ElasticSearch查询



由Kibana生成:
http://pastebin.com/hf1yNLhE

http:/ /pastebin.com/BFTYUsVg

推荐答案

该错误表示您没有足够的内存(更具体地说,内存对于 fielddata )来存储来自 hash 的所有值,所以你需要把它们从堆中拿出来因为你已经在使用 doc_values

That error says you don't have enough memory (more specifically, memory for fielddata) to store all the values from hash, so you need to take them out from the heap and put them on disk, meaning using doc_values.

c $ c> for my_prop 我建议对 my_prop.hash (而不是,主字段不被子字段继承):hash:{type:murmur3,index:no,doc_values:true}

Since you are already using doc_values for my_prop I suggest doing the same for my_prop.hash (and, no, the settings from the main field are not inherited by the sub-fields): "hash": { "type": "murmur3", "index" : "no", "doc_values" : true }.

这篇关于ElasticSearch + Kibana - 使用预先计算的散列值的唯一计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆