弹性搜索1.5.2部署问题 [英] Elasticsearch 1.5.2 deployment issue

查看:109
本文介绍了弹性搜索1.5.2部署问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有ES 1.5.2集群,具有以下规格:




  • 带有RAM的3个节点:32GB,CPU内核:每个8个

  • 282总索引

  • 2,564总分片

  • 799,505,935总文档

  • 767.84GB总数据

  • ES_HEAP_SIZE = 16g



问题是我在使用Kibana查询一些东西(非常简单的查询),如果它是一个单一的查询,它的工作正常,但如果我继续查询一些更多弹性越来越慢,最终卡住,因为JVM堆使用(从Marvel)得到至87-95%。当我尝试加载一些Kibana仪表板时,也会发生这种情况,唯一的解决方案是在所有节点上重新启动服务。



(这也发生在ES 2.2.0,1个节点,与Kibana 4)



什么是错,我缺少什么?
我想要查询较少?



编辑:



我不得不提到我有很多空的索引(0个文件),但碎片被计数。这是这样的,因为我在4w的文档上设置ttl,空白的索引将被策略人员删除。



我们还没有在1.5.2中禁用doc_values, 2.2.0集群
准确的规格如下(1.5.2):




  • 带有RAM的3个节点:32GB,CPU内核:每个8个

  • 282总指数= 227空+ 31奇迹+ 1基巴纳+23数据

  • 2,564总分数=(1135空+ 31奇迹+ 1基巴+ 115个数据)* 1个副本

  • 799,505,935总文档

  • 767.84GB总数据

  • ES_HEAP_SIZE = 16g



curl _cat / fielddata?v result:



1.5.2:

  total os.cpu.usage primaries.indexing.index_total total.fielddata.memory_size_in_bytes jvm.mem .heap_used_percent jvm.gc.collectors.young.collection_time_in_millis primaries.docs.count device.imei fs.total.available_in_bytes os.load_average.1m index.raw @timestamp node.ip_port.raw fs.total.disk_io_op node.name jvm.mem电子邮件.old.collection_count total.search.query_total 
2.1gb 1.2mb 3.5mb 3.4mb 1.1mb 0b 3.5mb 2.1gb 1.9mb 1.8mb 3.6mb 3.6mb 1.7mb 1.9mb 1.7mb 1.6mb 1.5mb 3.5mb 1.5mb 1.5mb 3.2mb
1.9gb 1.2mb 3.4mb 3.3mb 1.1mb 1.5mb 3.5mb 1.9gb 1.9mb 1.8mb 3.5mb 3.6mb 1.7mb 1.9mb 1.7mb 1.5mb 1.5mb 3。 4mb 0b 1.5mb 3.2mb
2gb 0b 0b 0b 0b 0b 0b 2gb 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b

2.2.0:

  total index_stats.index node.id node_stats.node_id buildNum endTime location.timestamp userActivity.time startTime时间shard.state shard.node indoorOutdoor.time shard.index dataThroughput.downloadSpeed 
176.2mb 0b 0b 0b 232b 213.5kb 518.8kb 479.7kb 45.5mb 80.1mb 1.4kb 920b 348.7kb 2.5kb 49.1mb


解决方案


  • 为1.5集群删除空索引

  • 您的堆的主要用途是fielddata - 每个节点大约9.5GB GB的过滤器缓存和大约1.7GB的段文件的元数据


    • 即使你的模板中有这个代码段来制作字符串 s not_analyzed ,在1.5这并不意味着ES将使用 doc_values 您需要专门启用他们

    • 如果您现在在1.5.x群集中启用 doc_values ,则更改将对新索引生效。对于旧索引,您需要重新索引数据。或者如果您有基于时间的索引(每天,每周等),您只需要等待新的索引被创建,而旧的索引将被删除。

    • 直到 doc_values 将在1.5集群的索引中占据主导地位,在评论中建议的@Val是唯一的选项:限制字段数据缓存大小或添加更多节点到您的集群(并隐式更多内存)或增加你的节点上的RAM。或手动清除fielddata缓存 ;-)


  • 完全与内存问题无关,但尝试避免使用ttl 。如果您不再需要某些数据,只需删除索引,不要依赖 ttl ,那么比简单删除索引要花费更多。使用 ttl 创建可能会在搜索时引起问题,并影响集群的整体性能,因为它会从索引中删除文档,这意味着大量的更新和许多更新合并到这些指数。由于你可能有时间的索引(这意味着昨天的数据并没有真正改变),所以使用ttl会对数据进行不必要的操作,否则这些数据应该是静态的(而且可能是优化)。


I have ES 1.5.2 cluster with the following specs:

  • 3 nodes with RAM: 32GB, CPU cores: 8 each
  • 282 total indices
  • 2,564 total shards
  • 799,505,935 total docs
  • 767.84GB total data
  • ES_HEAP_SIZE=16g

The problem is when I am using Kibana to query some thing (very simple queries), if it a single query it`s working fine, but if I continue to query some more - elastic is getting so slow and eventually stuck because the JVM heap usage (from Marvel) is getting to 87-95%. It happens also when I trying to load some Kibana dashboard and the only solution for this situation is to restart the service on all the nodes.

(This is also happens on ES 2.2.0 , 1 node, with Kibana 4)

What is wrong, what am I missing? Am I suppose to query less?

EDIT:

I had to mention that I have a lot of empty indices (0 documents) but the shards are counted. This is this way because I set ttl on the documents for 4w, and the empty indices will be deleted with curator.

Also we have not disabled doc_values in 1.5.2 nor 2.2.0 clusters. The accurate specs are as following (1.5.2):

  • 3 nodes with RAM: 32GB, CPU cores: 8 each
  • 282 total indices = 227 empty + 31 marvel + 1 kibana + 23 data
  • 2,564 total shards = (1135 empty + 31 marvel + 1 kibana + 115 data)* 1 replica
  • 799,505,935 total docs
  • 767.84GB total data
  • ES_HEAP_SIZE=16g

curl _cat/fielddata?v result:

1.5.2:

 total os.cpu.usage primaries.indexing.index_total total.fielddata.memory_size_in_bytes jvm.mem.heap_used_percent jvm.gc.collectors.young.collection_time_in_millis primaries.docs.count device.imei fs.total.available_in_bytes os.load_average.1m index.raw @timestamp node.ip_port.raw fs.total.disk_io_op node.name jvm.mem.heap_used_in_bytes jvm.gc.collectors.old.collection_time_in_millis total.merges.total_size_in_bytes jvm.gc.collectors.young.collection_count jvm.gc.collectors.old.collection_count total.search.query_total 
 2.1gb        1.2mb                          3.5mb                                3.4mb                     1.1mb                                                0b                3.5mb       2.1gb                       1.9mb              1.8mb     3.6mb      3.6mb            1.7mb               1.9mb     1.7mb                      1.6mb                                           1.5mb                            3.5mb                                    1.5mb                                  1.5mb                    3.2mb 
 1.9gb        1.2mb                          3.4mb                                3.3mb                     1.1mb                                             1.5mb                3.5mb       1.9gb                       1.9mb              1.8mb     3.5mb      3.6mb            1.7mb               1.9mb     1.7mb                      1.5mb                                           1.5mb                            3.4mb                                       0b                                  1.5mb                    3.2mb 
   2gb           0b                             0b                                   0b                        0b                                                0b                   0b         2gb                          0b                 0b        0b         0b               0b                  0b        0b                         0b                                              0b                               0b                                       0b                                     0b                       0b 

2.2.0:

  total index_stats.index node.id node_stats.node_id buildNum endTime location.timestamp userActivity.time startTime   time shard.state shard.node indoorOutdoor.time shard.index dataThroughput.downloadSpeed 
176.2mb                0b      0b                 0b     232b 213.5kb            518.8kb           479.7kb    45.5mb 80.1mb       1.4kb       920b            348.7kb       2.5kb                       49.1mb 

解决方案

  • delete the empty indices
  • for the 1.5 cluster the major usage of your heap is for fielddata - around 9.5GB for each node, 1.2GB for filter cache and around 1.7GB for segments files' metadata
    • even if you have that snippet in your template to make the strings as not_analyzed, in 1.5 this doesn't automatically mean ES will use doc_values, you need to specifically enable them.
    • if you enable doc_values now in 1.5.x cluster, the change will be effective with the new indices. For the old indices you need to reindex the data. Or if you have time-based indices (created daily, weekly etc) you just need to wait for the new indices to be created and the old ones to be deleted.
    • until the doc_values will be predominant in your indices in the 1.5 cluster, what @Val suggested in the comments is the only option: limit the fielddata cache size or add more nodes to your cluster (and implicitly more memory) or increase the RAM on your nodes. Or manually clear the fielddata cache ;-) from time to time.
  • not related to the memory issue entirely, but try to avoid using ttl. If you don't need some data anymore, simply delete the index, don't rely on ttl, it is much more costly than simply deleting the index. The use of ttl creates can potentially cause issues at search time and affect the overall performance of a cluster, as it deletes documents from indices, which means a lot of updates and a lot of merging to those indices. Since you probably have time-based indices (which means data from yesterday doesn't really change) using ttl brings unnecessary operations on data that should otherwise be static (and which can potentially be optimized).

这篇关于弹性搜索1.5.2部署问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆