Kafka Producer NetworkException 和超时异常 [英] Kafka Producer NetworkException and Timeout Exceptions

查看:38
本文介绍了Kafka Producer NetworkException 和超时异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在生产环境中收到随机的 NetworkExceptionsTimeoutExceptions:

经纪人:3动物园管理员:3服务器:3卡夫卡:0.10.0.1动物园管理员:3.4.3

我们偶尔会在我的生产者日志中收到此异常:

<块引用>

TOPIC:XXXXXX 的 10 条记录过期:自批处理以来已过去 5608 毫秒创作加上逗留时间.

此类错误消息中的毫秒数不断变化.有时 ~5 秒,有时长达 ~13 秒

而且我们很少得到:

NetworkException: 服务器在收到响应之前断开连接.

集群由3经纪人3动物园管理员组成.生产者服务器和 Kafka 集群在 同一 网络中.

我正在进行同步调用.有一个 Web 服务可供多个用户请求调用以发送他们的数据.Kafka Web 服务有一个 Producer 对象来完成所有的发送.生产者的请求超时最初是 1000 毫秒,现在已更改为 15000 毫秒(15 秒).即使在增加超时期限后 TimeoutExceptions 仍然显示在错误日志中.

可能是什么原因?

解决方案

找到根本原因有点棘手,我会放弃我的经验,希望有人会发现它有用.通常,它可能是网络问题或与 ack=ALL 结合使用过多的网络泛滥.这里的图表解释了

不包括网络配置问题或错误,您可以根据自己的情况调整这些属性以缓解或解决问题:

  • buffer.memory 控制生产者可用于缓冲的总内存.如果记录的发送速度比它们可以传输到 Kafka 的速度快,那么这个缓冲区将被超过,然后额外的发送调用阻塞到 ma​​x.block.ms 之后,Producer 抛出一个 TimeoutException.

  • ma​​x.block.ms 的值已经很高,我不建议进一步增加它.buffer.memory 的默认值为 32MB,根据您的消息大小,您可能希望增加它;如有必要,增加 jvm 堆空间.

  • Retries 定义在出现错误的情况下在放弃之前尝试重新发送记录的次数.如果您使用零重试,您可以尝试通过增加此值来缓解问题,请注意记录顺序不再保证,除非您将 ma​​x.in.flight.requests.per.connection 设置为 1.

  • 一旦达到批量大小或经过延迟时间(以先到者为准),就会发送记录.如果 batch.size(默认 16kb)小于最大请求大小,也许您应该使用更高的值.此外,将 linger.ms 更改为更高的值,例如 10、50 或 100,以优化批处理和压缩的使用.如果您正在使用它,这将减少网络中的泛滥并优化压缩.

此类问题没有确切的答案,因为它们还取决于实现,在我的情况下,尝试使用上述值有所帮助.

We are getting random NetworkExceptions and TimeoutExceptions in our production environment:

Brokers: 3
Zookeepers: 3
Servers: 3
Kafka: 0.10.0.1
Zookeeeper: 3.4.3

We are occasionally getting this exception in my producer logs:

Expiring 10 record(s) for TOPIC:XXXXXX: 5608 ms has passed since batch creation plus linger time.

Number of milliseconds in such error messages keep changing. Sometimes its ~5 seconds other times it's up to ~13 seconds!

And very rarely we get:

NetworkException: Server disconnected before response received. 

Cluster consists of 3 brokers and 3 zookeepers. Producer server and Kafka cluster are in same network.

I am making synchronous calls. There's a web service to which multiple user requests call to send their data. Kafka web service has one Producer object which does all the sending. Producer's Request timeout was 1000ms initially that has been changed to 15000ms (15 seconds). Even after increasing timeout period TimeoutExceptions are still showing up in error logs.

What can be the reason?

解决方案

It is a bit tricky to find the root cause, I'll drop my experience on that, hopefully someone may find it useful. In general it can be a network issue or too much network flood in combination with ack=ALL. Here a diagram that explain the TimeoutException from Kafka KIP-91 at he time of writing (still applicable till 1.1.0):

Excluding network configuration issues or errors, this are the properties you can adjust depending on your scenario in order to mitigate or solve the problem:

  • The buffer.memory controls the total memory available to a producer for buffering. If records get sent faster than they can be transmitted to Kafka then and this buffer will get exceeded then additional send calls block up to max.block.ms after then Producer throws a TimeoutException.

  • The max.block.ms has already a high value and I do not suggest to further increment it. buffer.memory has the default value of 32MB and depending on you message size you may want to increase it; if necessary increase the jvm heap space.

  • Retries define how many attempts to resend the record in case of error before giving up. If you are using zero retries you can try to mitigate the problem by increasing this value, beware record order is not guarantee anymore unless you set max.in.flight.requests.per.connection to 1.

  • Records are sent as soon as the batch size is reached or the linger time is passed, whichever comes first. if batch.size (default 16kb) is smaller than the maximum request size perhaps you should use a higher value. In addition, change linger.ms to a higher value such as 10, 50 or 100 to optimize the use of the batch and the compression. This will cause less flood in the network and optimize compression if you are using it.

There is not an exact answer on this kind of issues since they depends also on the implementation, in my case experimenting with the values above helped.

这篇关于Kafka Producer NetworkException 和超时异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆