kafka 新生产者在其中一个代理宕机后无法更新元数据 [英] kafka new producer is not able to update metadata after one of the broker is down

查看：25 发布时间：2021/11/12 2:19:53 apache-kafka kafka-producer-api

本文介绍了kafka 新生产者在其中一个代理宕机后无法更新元数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 kafka 环境，其中有 2 个经纪人和 1 个动物园管理员.

I have an kafka environment which has 2 brokers and 1 zookeeper.

当我尝试向 kafka 生成消息时，如果我停止代理 1(即领导者)，客户端将停止生成消息并给我以下错误，尽管代理 2 被选为该主题的新领导者并且分区.

While I am trying to produce messages to kafka, if i stop broker 1(which is the leader one) the client stops producing messaging and give me the below error although the broker 2 is elected as a new leader for the topic and partions.

org.apache.kafka.common.errors.TimeoutException:60000 毫秒后更新元数据失败.

org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.

10 分钟过去后，由于代理 2 是新的领导者，我希望生产者将数据发送到代理 2，但由于给出上述异常，它继续失败.lastRefreshMs 和 lastSuccessfullRefreshMs 仍然相同，尽管生产者的 metadataExpireMs 为 300000.

After 10 minutes passed, since broker 2 is new leader i expected producer to send data to broker 2 but it continued failing by giving above exception. lastRefreshMs and lastSuccessfullRefreshMs is still same although the metadataExpireMs is 300000 for producer.

我在生产者端使用 kafka 新的生产者实现.

I am using kafka new Producer implementation on producer side.

似乎当生产者启动时，它绑定到一个代理，如果该代理出现故障，它甚至不会尝试连接到集群中的另一个代理.

It seems that when producer is initiated, it binds to one broker and if that broker goes down it is not even trying to connect to another brokers in cluster.

但我的期望是，如果一个代理出现故障，它应该直接检查其他可用代理的元数据并将数据发送给他们.

But my expectation is if a broker goes down, it should directly check metadata for another brokers that are available and send data to them.

顺便说一句，我的主题是 4 个分区，复制因子为 2.提供此信息以防万一.

Btw my topic is 4 partition and has replication factor of 2. Giving this info in case it makes sense.

配置参数.

{request.timeout.ms=30000, retry.backoff.ms=100, buffer.memory=33554432, ssl.truststore.password=null, batch.size=16384, ssl.keymanager.algorithm=SunX509, receive.buffer.bytes=32768, ssl.cipher.suites=null, ssl.key.password=null, sasl.kerberos.ticket.renew.jitter=0.05, ssl.provider=null, sasl.kerberos.service.name=null, max.in.flight.requests.per.connection=5, sasl.kerberos.ticket.renew.window.factor=0.8, bootstrap.servers=[10.201.83.166:9500, 10.201.83.167:9500], client.id=rest-interface, max.request.size=1048576, acks=1, linger.ms=0, sasl.kerberos.kinit.cmd=/usr/bin/kinit, ssl.enabled.protocols=[TLSv1.2, TLSv1.1, TLSv1], metadata.fetch.timeout.ms=60000, ssl.endpoint.identification.algorithm=null, ssl.keystore.location=null, value.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, ssl.truststore.location=null, ssl.keystore.password=null, key.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, block.on.buffer.full=false, metrics.sample.window.ms=30000, metadata.max.age.ms=300000, security.protocol=PLAINTEXT, ssl.protocol=TLS, sasl.kerberos.min.time.before.relogin=60000, timeout.ms=30000, connections.max.idle.ms=540000, ssl.trustmanager.algorithm=PKIX, metric.reporters=[], compression.type=none, ssl.truststore.type=JKS, max.block.ms=60000, retries=0, send.buffer.bytes=131072, partitioner.class=class org.apache.kafka.clients.producer.internals.DefaultPartitioner, reconnect.backoff.ms=50, metrics.num.samples=2, ssl.keystore.type=JKS}

用例:

1- 启动 BR1 和 BR2 生产数据(Leader 是 BR1)

1- Start BR1 and BR2 Produce data (Leader is BR1)

2- 停止 BR2 产生数据(良好)

2- Stop BR2 produce data(fine)

3- 停止 BR1(这意味着此时集群中没有活动的工作代理)然后启动 BR2 并生成数据(失败，尽管领导者是 BR2)

3- Stop BR1(which means there is no active working broker in cluster at this time) and then Start BR2 and produce data (failed although leader is BR2)

4- 开始BR1产生数据(leader仍然是BR2但数据产生精细)

4- Start BR1 produce data(leader is still BR2 but data is produced finely)

5- 阻止 BR2(现在 BR1 是领导者)

5- Stop BR2(now BR1 is leader)

6- 阻止 BR1(BR1 仍然是领先者)

6- Stop BR1(BR1 is still leader)

7- 启动 BR1 产生数据(消息再次产生良好)

7- Start BR1 produce data(message is produced fine again)

如果生产者将最新的成功数据发送给 BR1，然后所有代理都宕机了，生产者希望 BR1 再次起床，尽管 BR2 已经起床并且是新的领导者.这是预期的行为吗?

If producer send the latest successful data to BR1 and then all brokers goes down, the producer expects BR1 to get up again although BR2 is up and new leader. Is this an expected behaviour?

kafka 新生产者在其中一个代理宕机后无法更新元数据 [英] kafka new producer is not able to update metadata after one of the broker is down

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

kafka 新生产者在其中一个代理宕机后无法更新元数据 [英] kafka new producer is not able to update metadata after one of the broker is down

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭