Elasticsearch 7.x断路器-数据太大-故障排除 [英] Elasticsearch 7.x circuit breaker - data too large - troubleshoot

查看:308
本文介绍了Elasticsearch 7.x断路器-数据太大-故障排除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:
从ES-5.4升级到ES-7.2之后,当尝试从我的多线程Java应用程序(使用 elasticsearch-rest)写入并发批量请求(或搜索请求)时,我开始收到数据太大"错误-high-level-client-7.2.0.jar Java客户端)添加到2-4个节点的ES群集.

The problem:
Since the upgrading from ES-5.4 to ES-7.2 I started getting "data too large" errors, when trying to write concurrent bulk request (or/and search requests) from my multi-threaded Java application (using elasticsearch-rest-high-level-client-7.2.0.jar java client) to an ES cluster of 2-4 nodes.

我的ES配置:

Elasticsearch version: 7.2

custom configuration in elasticsearch.yml:   
    thread_pool.search.queue_size = 20000  
    thread_pool.write.queue_size = 500

I use only the default 7.x circuit-breaker values, such as:  
    indices.breaker.total.limit = 95%  
    indices.breaker.total.use_real_memory = true  
    network.breaker.inflight_requests.limit = 100%  
    network.breaker.inflight_requests.overhead = 2  

elasticsearch.log中的错误:

    {
      "error": {
        "root_cause": [
          {
            "type": "circuit_breaking_exception",
            "reason": "[parent] Data too large, data for [<http_request>] would be [3144831050/2.9gb], which is larger than the limit of [3060164198/2.8gb], real usage: [3144829848/2.9gb], new bytes reserved: [1202/1.1kb]",
            "bytes_wanted": 3144831050,
            "bytes_limit": 3060164198,
            "durability": "PERMANENT"
          }
        ],
        "type": "circuit_breaking_exception",
        "reason": "[parent] Data too large, data for [<http_request>] would be [3144831050/2.9gb], which is larger than the limit of [3060164198/2.8gb], real usage: [3144829848/2.9gb], new bytes reserved: [1202/1.1kb]",
        "bytes_wanted": 3144831050,
        "bytes_limit": 3060164198,
        "durability": "PERMANENT"
      },
      "status": 429
    }

想法:
我很难找出问题的根源.
当使用< = 8gb堆大小的ES群集节点(在< = 16gb vm上)时,问题变得非常明显,因此,一种明显的解决方案是增加节点的内存.但是我觉得增加内存只会掩盖问题.

Thoughts:
I'm having hard time to pin point the source of the issue.
When using ES cluster nodes with <=8gb heap size (on a <=16gb vm), the problem become very visible, so, one obvious solution is to increase the memory of the nodes.
But I feel that increasing the memory only hides the issue.

问题:
我想了解什么情况可能导致此错误?
我应该采取什么措施才能正确处理?
(更改断路器值,更改es.yml配置,更改/限制我的ES请求)

Questions:
I would like to understand what scenarios could have led to this error?
and what action can I take in order to handle it properly?
(change circuit-breaker values, change es.yml configuration, change/limit my ES requests)

推荐答案

所以我花了一些时间研究ES如何准确地实现了新的断路器机制,并试图理解为什么我们突然遇到了这些错误?

So I've spent some time researching how exactly ES implemented the new circuit breaker mechanism, and tried to understand why we are suddenly getting those errors?

  1. 断路器机构自最早的版本开始就存在.
  2. 当从5.4版迁移到7.2版时,我们开始遇到有关它的问题
  3. 在7.2版中,ES引入了一种计算断路的新方法:基于实际内存使用情况的断路(原因和方式: https://github.com/elastic/elasticsearch/pull/31767 )
  4. 在ES内部升级到7.2版时,我们将jdk从8更改为11.
  5. 作为内部升级的一部分,我们更改了jvm.options的默认配置,将官方推荐的CMS GC替换为G1GC GC,elasticsearch对此提供了相当新的支持.
  6. 考虑到上述所有问题,我发现此错误已在7.4版中修复,该错误与将断路器与G1GC GC一起使用: https://github.com/elastic/elasticsearch/pull/46169
  1. the circuit breaker mechanism exists since the very first versions.
  2. we started experience issues around it when moving from version 5.4 to 7.2
  3. in version 7.2 ES introduced a new way for calculating circuit-break: Circuit-break based on real memory usage (why and how: https://www.elastic.co/blog/improving-node-resiliency-with-the-real-memory-circuit-breaker, code: https://github.com/elastic/elasticsearch/pull/31767)
  4. In our internal upgrade of ES to version 7.2, we changed the jdk from 8 to 11.
  5. also as part of our internal upgrade we changed the jvm.options default configuration, switching the official recommended CMS GC with the G1GC GC which have a fairly new support by elasticsearch.
  6. considering all the above, I found this bug that was fixed in version 7.4 regarding the use of circuit-breaker together with the G1GC GC: https://github.com/elastic/elasticsearch/pull/46169

如何修复:

  1. 将配置更改回CMS GC.
  2. 或者,解决问题.该错误的修补程序只是一个配置更改,可以轻松地在您的部署中对其进行更改和测试.

这篇关于Elasticsearch 7.x断路器-数据太大-故障排除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆