Elasticsearch丢弃了太多请求-缓冲区会改善情况吗? [英] Elasticsearch drops too many requests -- would a buffer improve things?

查看:114
本文介绍了Elasticsearch丢弃了太多请求-缓冲区会改善情况吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个工作组,这些工作组将索引请求发送到一个4节点的Elasticsearch集群.这些文档在生成时会被编入索引,并且由于工作人员具有很高的并发度,因此Elasticsearch在处理所有请求时遇到了麻烦.为了给出一些数字,工作人员可以同时处理多达3200个任务,每个任务通常会生成约13个索引请求.这会产生每秒60到250个索引请求的瞬时速率.

We have a cluster of workers that send indexing requests to a 4-node Elasticsearch cluster. The documents are indexed as they are generated, and since the workers have a high degree of concurrency, Elasticsearch is having trouble handling all the requests. To give some numbers, the workers process up to 3,200 tasks at the same time, and each task usually generates about 13 indexing requests. This generates an instantaneous rate that is between 60 and 250 indexing requests per second.

从一开始,Elasticsearch就有问题,请求超时或返回429.要解决此问题,我们将工作人员的超时增加到200秒,并增加了 write 线程池队列大小.我们的节点数达到700.

From the start, Elasticsearch had problems and requests were timing out or returning 429. To get around this, we increased the timeout on our workers to 200 seconds and increased the write thread pool queue size on our nodes to 700.

但这不是一个令人满意的长期解决方案,我正在寻找替代方案.我注意到,当我使用 elasticdump 在同一群集中复制索引时, write 线程池几乎为空,我将其归因于 elasticdump 批处理索引请求,并(可能)使用批量API与Elasticsearch进行通信.

That's not a satisfactory long-term solution though, and I was looking for alternatives. I have noticed that when I copied an index within the same cluster with elasticdump, the write thread pool was almost empty and I attributed that to the fact that elasticdump batches indexing requests and (probably) uses the bulk API to communicate with Elasticsearch.

这给了我一个想法,我可以编写一个缓冲区,以接收来自工作程序的请求,将它们按200-300个请求的组进行批处理,然后将批量请求仅发送给一组的Elasticsearch.

That gave me the idea that I could write a buffer that receives requests from the workers, batches them in groups of 200-300 requests and then sends the bulk request to Elasticsearch for one group only.

这样的东西已经存在了,听起来像个好主意吗?

Does such a thing already exist, and does it sound like a good idea?

推荐答案

首先,重要的是要了解将索引请求发送给Elasticsearch时在后台发生的情况,解决问题或查找根本原因.

First of all, it's important to understand what happens behind the scene when you send the index request to Elasticsearch, to troubleshoot the issue or finding the root-cause.

Elasticsearch具有多个线程池,但是对于正在使用索引请求(单个/批量)的线程池,请根据您的Elasticsearch版本进行检查,因为Elastic会不断更改线程池(之前有一个单独的线程池,用于单个和批量请求,具有不同的队列容量).

Elasticsearch has several thread pools but for indexing requests(single/bulk) write threadpool is being used, please check this according to your Elasticsearch version as Elastic keeps on changing the threadpools(earlier there was a separate threadpool for single and bulk request with different queue capacity).

在最新的ES版本(7.10)中,写入线程池的队列容量从200(在较早版本中存在)显着增加到10000 .

  1. Elasticsearch现在更喜欢缓冲更多的索引请求,而不是拒绝请求.
  2. 尽管增加队列容量意味着更多的延迟,但这是一个折衷,如果客户端没有重试机制,这将减少数据丢失.

我确定,当容量增加时,您不会迁移到ES 7.9版本,但是您可以缓慢地增加此队列的大小,并可以通过第5章中所述的配置更改轻松地分配更多的处理器(如果有更多的容量).此官方示例.尽管这是一个值得商bat的话题,并且很多人认为这是一个临时解决方案,而不是适当的解决方案,但是现在随着Elastic自己增加队列大小,您也可以尝试一下,如果持续时间很短,流量比它更有意义.

I am sure, you would have not moved to ES 7.9 version, when capacity was increased, but you can increase the size of this queue slowly and allocate more processors(if you have more capacity) easily through the config change mentioned in this official example. Although this is a very debatable topic and a lot of people consider this as a band-aid solution than the proper fix, but now as Elastic themself increased the queue size, you can also try it, and if you have a short duration of increased traffic than it makes even more sense.

另一个重要的事情是找出导致您的ES节点排队更多请求的根本原因,这可能是合法的,例如增加索引流量并在基础上达到其极限.但是如果不合法,可以看看我的一些简短提示,以改善一次性索引性能和整体将性能编入索引,通过实施以下提示,您将获得更好的索引编制率,这将减轻写线程池队列的压力.

Another critical thing is to find out the root cause why your ES nodes are queuing up more requests, it can be legitimate like increasing indexing traffic and infra reached its limit. but if it's not legitimate you can have a look at my short tips to improve one-time indexing performance and overall indexing performance, by implementing these tips you will get a better indexing rate which will reduce the pressure on write thread pool queue.

如@Val在评论中所提到的,如果您也要一一索引文档,然后转到

As mentioned by @Val in the comment, if you are also indexing docs one by one then moving to bulk index API will give you the biggest boost.

这篇关于Elasticsearch丢弃了太多请求-缓冲区会改善情况吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆