消费者陷入重新加入 [英] Consumer Stuck in Re-join

查看:219
本文介绍了消费者陷入重新加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了其他主题,并且通过使用新的组ID解决了这个问题,但是我想了解是什么原因引起的.

I've read other threads and I've gotten around the problem by using a new group ID, however I'd like to understand what could cause this.

我有一个包含16个分区的主题,我已将session.timeout.ms = 30000设置为max.poll.interval.ms = 30000000.

I have a topic with 16 partitions, I've set session.timeout.ms=30000, and max.poll.interval.ms=30000000.

我运行了程序,然后按ctrl + c组合键,因此无法正常关闭.在我猜了16次之后,我陷入了这个重新加入的问题. session.timeout.ms是心跳超时,因此30秒后它应该使我的消费者正确,而我的分区应该释放"对吗?还是只听我的max.poll.interval.ms?

I run my program, and ctrl+c it, so it's not closing properly. After I guess, 16 times, I get stuck in this re-join issue. session.timeout.ms is the heartbeat timeout, so after 30 seconds it should kick my consumer right and my partitions should "free up" right? Or is it only listening to my max.poll.interval.ms?

我仍然间歇性地收到此错误,并且发生这种情况时,我必须重新启动所有使用者.即使我的消费者运行良好,然后他们开始陷入重新加入的困境(没有添加/移除消费者),也会发生这种情况.这是一个错误日志,来自当我尝试将新使用者卡在该状态时与之连接之后的错误日志:

I still get this error intermittently, and when it happens i have to restart all my consumers. This happens even when my consumers were running fine and then they start all getting stuck at rejoining (no consumers were added/removed). Here's an error log from when I try to connect to it after with a new consumer when it's stuck in that state :

https://pastebin.com/AXJeSHkp

2017-06-29 17:28:16,215 DEBUG [AbstractCoordinator] - [scheduler-1] - Sending JoinGroup ((type: JoinGroupRequest, groupId=ingestion-matching-kafka-consumer-group-dev1, sessionTimeout=30000, rebalanceTimeout=43200000, memberId=, protocolType=consumer, groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@b45e5583)) to coordinator kafka04-prod01.messagehub.services.us-south.bluemix.net:9093 (id: 2147483644 rack: null)

2017-06-29 17:37:21,261 DEBUG [NetworkClient] - [scheduler-1] - Node 2147483644 disconnected.
2017-06-29 17:37:21,263 DEBUG [ConsumerNetworkClient] - [scheduler-1] - Cancelled JOIN_GROUP request {api_key=11,api_version=1,correlation_id=19,client_id=ingestion-matching-kafka-consumer-dev1} with correlation id 19 due to node 2147483644 being disconnected

这些是我认为相关的第一条消息和最后一条消息.这是我设置的相关超时:

Those are the first and last messages I think are relevant. Here are the relevant timeouts I've set:

session.timeout.ms=30000
max.poll.interval.ms=43200000    
request.timeout.ms=43205000 # the docs said to keep this higher than max.poll.interval.ms
enable.auto.commit=false

我也应该设置heartbeat.interval.ms吗?这是消费者在某些后台线程中自动将心跳发送到代理的时间间隔(我已经阅读了文档,但是由于某种原因我无法完全解决这个问题)?

Should I set heartbeat.interval.ms too? Is this the interval that heartbeats are sent by the consumer to the broker automatically in some background thread (I have read the docs but for some reason I can't quite wrap my head around it)?

推荐答案

如果您的客户端未正确断开连接(崩溃或SIGINT),则服务器将花费session.timeout.ms(在您情况下为30秒)启动从小组.在此期间,服务器仍会认为使用者是该组的一部分,因此它不会进行任何重新分配.一旦此延迟结束,分配的分区将重新分配给其他使用者(如果有).

If your client does not disconnect properly (crash or SIGINT), it will take session.timeout.ms (30 seconds in your case) for the server to kick it from the group. During this time, the server will still think the consumer is part of the group, so it will not do any reassignments. Once this delay is over, assigned partitions will be reassigned to other consumers (if any).

如果您使用新的组ID,则当然不会发生这种情况.虽然每次开发时都想使用一个新组(因为您不必等待),但是您丢失了上一个组的已提交偏移量,这可能并不代表您的应用在生产环境中运行时所处的状态.

This of course does not happen if you use a new group ID. While it's tempting to use a new group everytime when developing (as you don't have to wait) you lose any committed offsets by the previous group and this might not represent the state your app will be in while running in production.

关于max.poll.interval.ms,它是使用者逻辑中两次调用poll()之间允许的最大延迟.我认为此设置与此问题无关.

Regarding max.poll.interval.ms, it's the maximum delay allowed between 2 calls to poll() in your consumer logic. I don't think this setting is relevant to this question.

这篇关于消费者陷入重新加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆