Elasticsearch / Kibana场数据太大 [英] Elasticsearch / Kibana field data too large

查看:1126
本文介绍了Elasticsearch / Kibana场数据太大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在测试的小型ELK集群。 kibana web界面非常慢,并且会引发很多错误。



Kafka => 8.2

Logstash => 1.5rc3(最新)

Elasticsearch => 1.4.4(最新)

Kibana => 4.0.2(最新)



弹性搜索节点在Ubuntu 14.04上有10GB的RAM。我每天在5GB和20GB的数据之间拉。



运行即使是一个简单的查询,只有15分钟的数据在kibana Web界面需要几分钟,并经常抛出错误。

  [FIELDDATA]数据太大,[timeStamp]的数据将大于[3751437926 / 3.4gb]]的限制] 



这些关于分片失败的错误只出现在kibana中。根据所有其他插件(head,kopf),弹性搜索分片是完美的,集群是绿色的。



我已经检查过google集团,IRC并查看堆栈溢出。似乎唯一的解决方案是增加公羊。我增加了我的节点上的ram两次。虽然这似乎解决了一两天,这个问题很快就会恢复。
清理缓存等其他解决方案没有长期的改进。

  curl -XPUT'http://elastic.example.com:9200/cache/clear?filter=true'
curl -XPOST'http://elastic.example.com:9200/_cache/clear'-d'{fielddata:true}'

根据KOPF插件,在完全空闲的集群上,堆空间的数量常常接近75%。 (我是公司中唯一一家使用它)。 3个具有10GB RAM的节点对于我拥有的数据量应该是足够的。



我也尝试将破碎锤调整为,解释可能导致这些问题的原因。

解决方案

您正在索引大量数据(如果您每天添加/创建5到20GB)和您的节点记忆力相当低您不会在索引前端看到任何问题,但是在单个或多个索引上提取数据会导致问题。请记住,Kibana在后台运行查询,您收到的消息基本上是说我不能得到您的数据,因为我需要在内存中提供更多的数据可用于运行这些查询。



有两件事情相对简单,应该解决您的问题:





关键在于 doc_values 。您需要修改映射以将此属性设置为 true 。原始示例:

  [...],
属性:{
age {
type:integer,
doc_values:true
},
zipcode:{
type:integer b $ bdoc_values:true
},
nationality:{
type:string,
index:not_analyzed,
doc_values:true
},
[...]

更新您的映射将使未来的索引考虑到这一点,但是您需要重新索引现有的映射,以便将 doc_values 应用于现有索引。 (请参见扫描/滚动此博客文章以获取更多提示。)



副本帮助缩放但如果不减少每个节点的堆大小,则会遇到相同的问题。至于您目前拥有的碎片数量,可能没有必要也不是最优,但我认为这不是您的问题的根本原因。



请记住,上面提到的建议是允许Kibana运行查询并显示数据。速度将极大地依赖于您设置的日期范围,您拥有的计算机(CPU,SSD等)以及每个节点上可用的内存。


I have a small ELK cluster that is in testing. The kibana web interface is extremely slow and throws a lot of errors.

Kafka => 8.2
Logstash => 1.5rc3 (latest)
Elasticsearch => 1.4.4 (latest)
Kibana => 4.0.2 (latest)

The elasticsearch nodes have 10GB of ram each on Ubuntu 14.04. I'm pulling in between 5GB and 20GB of data per day.

Running even a simple query, with only 15 minutes of data in the kibana web interface takes several minutes, and often throws errors.

[FIELDDATA] Data too large, data for [timeStamp] would be larger than limit of [3751437926/3.4gb]]

These errors about the shard failures only appear in kibana. According to all other plugins(head, kopf), the elasticsearch shards are perfectly fine, and the cluster is green.

I've checked with the google group, IRC and looked at stack overflow. It seems the only solution is to increase the ram. I've increased the ram on my nodes twice. While that seems to fix it for a day or two, the problem quickly returns. Other solutions such as cleaning the cache have no long term improvements.

curl -XPUT 'http://elastic.example.com:9200/cache/clear?filter=true'
curl -XPOST 'http://elastic.example.com:9200/_cache/clear' -d '{ "fielddata": "true" }'

According to the KOPF plugin, the amount of heap space routinely approaches 75% on a completely idle cluster. (I'm the only one in the company using it). 3 Nodes with 10GB of ram should be more than enough for the amount of data that I have.

I have also tried adjusting the breakers as suggested by this blog.

PUT /_cluster/settings -d '{ "persistent" : { "indices.breaker.fielddata.limit" : "70%" } }'
PUT /_cluster/settings -d '{ "persistent" : {  "indices.fielddata.cache.size" : "60%" } }'

How can I prevent these errors , and fix the extreme slowness in kibana?

https://github.com/elastic/kibana/issues/3221
elasticsearch getting too many results, need help filtering query
http://elasticsearch-users.115913.n3.nabble.com/Data-too-large-error-td4060962.html

Update

I have about 30 days of indexes from logstash. 2x Replication so that is 10 shards per day.

Update2

I've increased the ram of each node to 16GB, (48GB total) and I've also upgraded to 1.5.2.

This appears to fix the issue for a day or two, however the problem returns.

Update3

This blog article from an elastic employee has good tips explaining what can cause these issues.

解决方案

You're indexing a lot of data (if you're adding/creating 5 to 20GB a day) and your nodes are quite low on memory. You won't see any problems on the indexing front but fetching data on a single or multiple indexes will cause problems. Keep in mind that Kibana runs queries in the background and the message you're getting is basically saying something along the lines of "I can't get that data for you because I need to put more data in memory than I have available in order to run these queries."

There are two things that are relatively simple to do and should solve your problems:

  • Upgrade to ElasticSearch 1.5.2 (Major performance improvements)
  • When you're short on memory, you really need to use doc_values in all of your mappings as this will reduce the heap size drastically

The key lies in doc_values though. You need to modify your mapping(s) to set this property to true. Crude example:

[...],
"properties": {
    "age": {
      "type": "integer",
      "doc_values": true
    },
    "zipcode": {
      "type": "integer",
      "doc_values": true
    },
    "nationality": {
      "type": "string",
      "index": "not_analyzed",
      "doc_values": true
    },
    [...]

Updating your mapping(s) will make future indexes take this into account but you'll need to reindex existing ones entirely for doc_values to apply on existing indexes. (See scan/scroll and this blog post for more tips.)

Replicas help scale but will run into the same problems if you don't reduce the heap size of each node. As for the number of shards you currently have, it may not be necessary nor optimal but I don't think it's the root cause of your problems.

Keep in mind that the suggestions mentioned above are to allow Kibana to run the queries and show you data. Speed will rely greatly on the date ranges you set, on the machines you have (CPU, SSD, etc), and on the memory available on each node.

这篇关于Elasticsearch / Kibana场数据太大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆