Kafka机架ID和最小同步副本 [英] Kafka rack-id and min in-sync replicas

查看:89
本文介绍了Kafka机架ID和最小同步副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Kafka引入了机架ID,以在整个机架出现故障时提供冗余功能.有一个最小同步副本设置,用于指定生产者收到确认(-1/所有配置)之前需要同步的最小副本数.有一个不干净的领导者选举设置,用于指定在不同步时是否可以选举领导者.

Kafka has introduced rack-id to provide redundancy capabilities if a whole rack fails. There is a min in-sync replica setting to specify the minimum number of replicas that need to be in-sync before a producer receives an ack (-1 / all config). There is an unclean leader election setting to specify whether a leader can be elected when it is not in-sync.

因此,考虑到以下情况:

So, given the following scenario:

  • 两个架子.机架1、2.
  • 复制计数为4.
  • 最小同步副本= 2
  • 生产者ack = -1(全部).
  • 不干净的领导人选举=假

旨在至少有一次消息传递,节点冗余和容错机架故障.

Aiming to have at least once message delivery, redundancy of nodes and tolerant to a rack failure.

是否有可能同时存在两个同步副本都来自机架1的情况,所以生产者会收到一个ack,此时机架1崩溃(在机架2中的任何副本都处于同步状态之前)?这意味着机架2将仅包含不干净的副本,并且生产者将无法向分区添加消息,本质上是将其停顿了下来.副本将是不干净的,因此无论如何都不能选举出新的领导者.

Is it possible that there is a moment where the two in-sync replicas both come from rack 1, so the producer receives an ack and at that point rack 1 crashes (before any replicas from rack 2 are in-sync)? This means that rack 2 will only contain unclean replicas and no producers would be able to add messages to the partition essentially grinding to a halt. The replicas would be unclean so no new leader could be elected in any case.

我的分析是否正确,还是有底蕴可以确保构成最小同步副本的副本必须来自不同的机架?
由于同一机架上的副本具有较低的延迟,因此上述情况似乎很可能发生.

Is my analysis correct, or is there something under the hood to ensure that the replicas forming min in-sync replicas have to be from different racks?
Since replicas on the same rack would have lower latency it seems that the above scenario is reasonably likely.

该场景如下图所示:

推荐答案

要在技术上正确无误,您应该解决一些问题的措辞.不可能有不同步的副本可用".最小同步副本数设置还指定了分区必须保持同步的最小副本数,以保持可用于写操作.当生产者指定ack(-1/all config)时,它仍将在那一刻等待来自所有同步副本的ack(与最小同步副本的设置无关).因此,如果您在4个副本同步时发布,那么除非所有4个副本都提交了消息(即使将最小同步副本配置为2),您也不会收到确认.仍然有可能构建与您的问题类似的方案,以突出显示相同的权衡问题,方法是先在机架2中有2个分区不同步,然后在仅有2个ISR位于机架1中时发布,然后将机架1放下.在这种情况下,这些分区将不可用于读取或写入.因此,解决此问题的最简单方法是将最小同步副本数增加到3个.另一个容错性较小的解决方法是将复制因子减少到3个.

To be technically correct you should fix some of the questions wording. It is not possible to have out of sync replicas "available". Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes. When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2). It's still possible to construct a scenario similar to your question that highlight the same tradeoff problem by having 2 partitions in rack 2 out of sync first, then publish when the only 2 ISRs are in rack 1, and then take rack 1 down. In that case those partitions would be unavailable for read or write. So the easiest fix to this problem would be to increase min in-sync replicas to 3. Another less fault tolerant fix would be to reduce replication factor to 3.

这篇关于Kafka机架ID和最小同步副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆