ElasticSearch + Kibana - 使用预先计算的散列值的唯一计数 [英] ElasticSearch + Kibana - Unique count using pre-computed hashes
问题描述
更新:添加
我想在我的ElasticSearch群集上执行唯一的计数。
集群包含大约5千万条记录。
I want to perform unique count on my ElasticSearch cluster. The cluster contains about 50 millions of records.
我尝试了以下方法:
在本节:
预计算哈希通常是仅在非常大和/或高基数字段上有用,因为它可节省CPU和内存。
Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory.
第二种方法
在这里提到部分:
除非您将弹性搜索配置为使用doc_values作为字段数据格式,否则使用聚合和构面是
Unless you configure Elasticsearch to use doc_values as the field data format, the use of aggregations and facets is very demanding on heap space.
我的属性映射
My property mapping
"my_prop": {
"index": "not_analyzed",
"fielddata": {
"format": "doc_values"
},
"doc_values": true,
"type": "string",
"fields": {
"hash": {
"type": "murmur3"
}
}
}
问题
当我在Kibana中的my_prop.hash上使用唯一的计数时,我收到以下错误:
The problem
When I use unique count on my_prop.hash in Kibana I receive the following error:
Data too large, data for [my_prop.hash] would be larger than limit
ElasticSearch有2g堆大小。
以上对于具有400万条记录的单个索引也无效。
ElasticSearch has 2g heap size. The above also fails for a single index with 4 millions of records.
- 我在配置中缺少某些东西?
- 我应该增加机器吗?这似乎不是可扩展的解决方案。
ElasticSearch查询
由Kibana生成:
http://pastebin.com/hf1yNLhE
推荐答案
该错误表示您没有足够的内存(更具体地说,内存对于 fielddata
)来存储来自 hash
的所有值,所以你需要把它们从堆中拿出来因为你已经在使用 doc_values
。
That error says you don't have enough memory (more specifically, memory for fielddata
) to store all the values from hash
, so you need to take them out from the heap and put them on disk, meaning using doc_values
.
c $ c> for my_prop
我建议对 my_prop.hash
(而不是,主字段不被子字段继承):hash:{type:murmur3,index:no,doc_values:true}
。
Since you are already using doc_values
for my_prop
I suggest doing the same for my_prop.hash
(and, no, the settings from the main field are not inherited by the sub-fields): "hash": { "type": "murmur3", "index" : "no", "doc_values" : true }
.
这篇关于ElasticSearch + Kibana - 使用预先计算的散列值的唯一计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!