使用 __consumer_offsets 杀死节点导致消费者没有消息消费 [英] Killing node with __consumer_offsets leads to no message consumption at consumers

查看:127
本文介绍了使用 __consumer_offsets 杀死节点导致消费者没有消息消费的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 3 个节点(nodes0、node1、node2)Kafka 集群(broker0、broker1、broker2),复制因子为 2,Zookeeper(使用与 Kafka tar 一起打包的 zookeeper)在不同的节点(节点 4)上运行.

I have 3 node(nodes0,node1,node2) Kafka cluster(broker0, broker1, broker2) with replication factor 2 and Zookeeper(using zookeeper packaged with Kafka tar) running on a different node (node 4).

我在启动 zookper 和剩余节点后启动了代理 0.在 broker 0 日志中可以看到它正在读取 __consumer_offsets 并且它们似乎存储在 broker 0 上.以下是示例日志:

I had started broker 0 after starting zookeper and then remaining nodes. It is seen in broker 0 logs that it is reading __consumer_offsets and seems they are stored on broker 0. Below are sample logs:

Kafka 版本:kafka_2.10-0.10.2.0

    2017-06-30 10:50:47,381] INFO [GroupCoordinator 0]: Loading group metadata for console-consumer-85124 with generation 2 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-41 in 23 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-44 (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-44 in 5 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-47 (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-47 in 11 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-1 (kafka.coordinator.GroupMetadataManager)

此外,我可以在同一个 broker 0 日志中看到 GroupCoordinator 消息.

Also, I can see GroupCoordinator messages in the same broker 0 logs.

[2017-06-30 14:35:22,874] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-34472 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:22,877] INFO [GroupCoordinator 0]: Group console-consumer-34472 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-6612 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Group console-consumer-6612 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-30165 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Group console-consumer-30165 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:43:15,656] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 3 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 14:53:15,653] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

在使用kafka-console-consumer.sh 和 kafka-console-producer.sh 为集群测试容错时,我看到在杀死代理 1 或broker 2,消费者仍然可以接收来自生产者的新消息.重新平衡正在正确进行.

While testing fault tolerance for the cluster using the kafka-console-consumer.sh and kafka-console-producer.sh, I see that on killing broker 1 or broker 2, the consumer can still receive new messages coming from producer. The Rebalance is happening correctly.

但是,杀死 broker 0 不会导致任何数量的消费者都没有使用新消息或旧消息.下面是broker 0被杀前后的topic状态.

However, killing broker 0 leads to no new or old messages consumption at any number of consumers. Below is the state of topic before and after broker 0 is killed.

之前

Topic:test-topic    PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: test-topic   Partition: 0    Leader: 2   Replicas: 2,0   Isr: 0,2
    Topic: test-topic   Partition: 1    Leader: 0   Replicas: 0,1   Isr: 0,1
    Topic: test-topic   Partition: 2    Leader: 1   Replicas: 1,2   Isr: 1,2

之后

Topic:test-topic    PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: test-topic   Partition: 0    Leader: 2   Replicas: 2,0   Isr: 2
    Topic: test-topic   Partition: 1    Leader: 1   Replicas: 0,1   Isr: 1
    Topic: test-topic   Partition: 2    Leader: 1   Replicas: 1,2   Isr: 1,2

以下是在 broker 0 被杀死后消费者日志中看到的 WARN 消息

Following are the WARN messages seen in the consumer logs after broker 0 is killed

[2017-06-30 14:19:17,155] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-34472: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-30 14:19:10,542] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-30165: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

经纪人属性.其余默认属性不变.

broker.id=0
delete.topic.enable=true

auto.create.topics.enable=false
listeners=PLAINTEXT://XXX:9092
advertised.listeners=PLAINTEXT://XXX:9092
log.dirs=/tmp/kafka-logs-test1
num.partitions=3
zookeeper.connect=XXX:2181

生产者属性.其余默认属性不变.

bootstrap.servers=XXX,XXX,XXX
compression.type=snappy

消费者属性.其余默认属性不变.

zookeeper.connect=XXX:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group

据我所知,如果持有/代理 GroupCoordinator 和 __consumer_offsets 的节点死亡,那么即使为分区选出了新的领导者,消费者也无法恢复正常操作.

As far I understand, if node holding/acting GroupCoordinator and __consumer_offsets dies, then the consumer unable to resume normal operations in spite of new leaders elected for partitions.

我在帖子中看到类似的帖子.这篇文章建议重新启动失效的代理节点.但是,在生产环境中重新启动代理 0 之前,尽管有更多节点,但消息消耗会延迟.

I see something similar posted in post. This post suggests to restart the dead broker node. However, there would be delay in message consumption in-spite of having more nodes until broker 0 is restarted in production environment.

Q1:如何缓解上述情况?

Q2:有没有办法把 GroupCoordinator, __consumer_offsets 改成另一个节点?

感谢任何建议/帮助.

推荐答案

检查 __consumer_offsets 主题上的复制因子.如果不是 3,那就是你的问题.

Check the replication factor on the __consumer_offsets topic. If it's not 3 then that's your problem.

运行以下命令 kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets 并查看输出的第一行是否显示ReplicationFactor:1"或ReplicationFactor:3".

Run the following command kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets and see if in the first line of output it says "ReplicationFactor:1" or "ReplicationFactor:3".

在进行试验以首先设置一个节点,然后使用复制因子为 1 创建此主题时,这是一个常见问题.后来当您扩展到 3 个节点时,您忘记更改此现有主题的主题级别设置,因此即使您正在生产和消费的主题是容错的,偏移主题仍然仅停留在代理 0 上.

It's a common problem when doing trials to first setup one node and then this topic gets created with replication factor of 1. Later when you expand to 3 nodes you forget to change the topic level settings on this existing topic so even though the topics you are producing and consuming from are fault tolerant, the offsets topic is still stuck on broker 0 only.

这篇关于使用 __consumer_offsets 杀死节点导致消费者没有消息消费的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆