Kafka 机架 ID 和最小同步副本 [英] Kafka rack-id and min in-sync replicas

查看:31
本文介绍了Kafka 机架 ID 和最小同步副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Kafka 引入了 rack-id 以在整个机架出现故障时提供冗余功能.有一个最小同步副本设置来指定在生产者收到确认之前需要同步的最小副本数(-1/所有配置).有一个不干净的leader选举设置来指定一个leader在不同步时是否可以被选举.

Kafka has introduced rack-id to provide redundancy capabilities if a whole rack fails. There is a min in-sync replica setting to specify the minimum number of replicas that need to be in-sync before a producer receives an ack (-1 / all config). There is an unclean leader election setting to specify whether a leader can be elected when it is not in-sync.

因此,鉴于以下情况:

  • 两个机架.机架 1、2.
  • 复制次数为 4.
  • 最小同步副本数 = 2
  • 生产者 ack=-1(全部).
  • 不洁的领导人选举 = false

旨在实现至少一次消息传递、节点冗余和机架故障容错.

Aiming to have at least once message delivery, redundancy of nodes and tolerant to a rack failure.

是否有可能在某一时刻,两个同步副本都来自机架 1,因此生产者收到 ack 并且此时机架 1 崩溃(在机架 2 的任何副本同步之前)?这意味着机架 2 将只包含不干净的副本,并且没有生产者能够将消息添加到分区,基本上是停止.副本将是不干净的,因此在任何情况下都无法选出新的领导者.

Is it possible that there is a moment where the two in-sync replicas both come from rack 1, so the producer receives an ack and at that point rack 1 crashes (before any replicas from rack 2 are in-sync)? This means that rack 2 will only contain unclean replicas and no producers would be able to add messages to the partition essentially grinding to a halt. The replicas would be unclean so no new leader could be elected in any case.

我的分析是否正确,或者是否有什么东西可以确保形成最小同步副本的副本必须来自不同的机架?
由于同一机架上的副本具有较低的延迟,因此上述情况似乎是合理的.

Is my analysis correct, or is there something under the hood to ensure that the replicas forming min in-sync replicas have to be from different racks?
Since replicas on the same rack would have lower latency it seems that the above scenario is reasonably likely.

场景如下图所示:

推荐答案

为了在技术上正确,您应该修正一些问题的措辞.不可能有不同步的副本可用".此外,最小同步副本设置指定了需要同步以使分区保持可用于写入的最小副本数.当生产者指定 ack (-1/all config) 时,它仍然会等待来自所有同步副本的 ack 在那一刻(独立于最小同步副本的设置).因此,如果您在 4 个副本同步时发布,则除非所有 4 个副本都提交消息(即使最小同步副本配置为 2),否则您将不会收到确认.仍然可以构建一个类似于您的问题的场景,通过首先使机架 2 中的 2 个分区不同步,然后在机架 1 中仅有 2 个 ISR 时发布,然后将机架 1 取下,来突出显示相同的权衡问题.在这种情况下,这些分区将无法读取或写入.因此,解决此问题的最简单方法是将最小同步副本数增加到 3.另一个容错性较低的解决方法是将复制因子减少到 3.

To be technically correct you should fix some of the questions wording. It is not possible to have out of sync replicas "available". Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes. When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2). It's still possible to construct a scenario similar to your question that highlight the same tradeoff problem by having 2 partitions in rack 2 out of sync first, then publish when the only 2 ISRs are in rack 1, and then take rack 1 down. In that case those partitions would be unavailable for read or write. So the easiest fix to this problem would be to increase min in-sync replicas to 3. Another less fault tolerant fix would be to reduce replication factor to 3.

这篇关于Kafka 机架 ID 和最小同步副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆