Kafka如何保证零停机时间和零数据丢失? [英] How Kafka guarantees zero downtime and zero data loss?

查看:61
本文介绍了Kafka如何保证零停机时间和零数据丢失?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

彻底了解卡夫卡.但是不知道它如何实现零停机时间和零损失.

Gone through about Kafka. But don't know about how it achieves zero downtime and zero loss.

推荐答案

我将通过解释Kafka的总体工作方式以及如何处理故障来回答您的问题.

I'll answer to your question by explaining how Kafka works in general and how it deals with failures.

每个 topic ,都是特定的数据流(类似于数据库中的表).主题分为 partitions (任意多个),分区中的每条消息都会获得一个增量ID,称为偏移量,如下所示.

Every topic, is a particular stream of data (similar to a table in a database). Topics, are split into partitions (as many as you like) where each message within a partition gets an incremental id, known as offset as shown below.

分区0:

+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+

分区1:

+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+

现在,Kafka集群由多个经纪人组成.每个代理都有一个ID标识,并且可以包含某些主题分区.

Now a Kafka cluster is composed of multiple brokers. Each broker is identified with an ID and can contain certain topic partitions.

2个主题的示例(每个主题分别具有3个分区和2个分区):

Example of 2 topics (each having 3 and 2 partitions respectively):

经纪人1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+

经纪人2:

+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+

经纪人3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

请注意,数据是分布式的(并且经纪人3 不包含主题2 的任何数据).

Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).

主题,应具有复制因子>1(通常为2或3),以便在代理崩溃时,另一个代理可以提供主题数据.例如,假设我们有一个带有2个分区的主题,并且 replication-factor 设置为2,如下所示:

Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:

经纪人1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

经纪人2:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 1     |
+-------------------+

经纪人3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

现在假定经纪人2 失败了.经纪人1 和3仍然可以提供主题1的数据. replication-factor 为3始终是一个好主意,因为它允许删除一个经纪人进行维护.目的,以及另一个意外移除的东西.

Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. A replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly.

这是关于Kafka如何提供强大的耐用性和容错保证的一般想法.

That’s the general idea around how Kafka offers strong durability and fault tolerance guarantees.

有关领导者的说明:在任何时候,只有一个代理可以成为该分区的领导者,并且只有该领导者可以接收和提供该分区的数据.其余的代理将仅同步数据(同步副本).还要注意,当 replication-factor 设置为1时,如果代理失败,则 leader 不能移到其他位置.通常,当分区的所有副本失败或脱机时, leader 将自动设置为 -1 .

Note about Leaders: At any time, only one broker can be a leader of a partition and only that leader can receive and serve data for that partition. The remaining brokers will just synchronize the data (in-sync replicas). Also note that when the replication-factor is set to 1, the leader cannot be moved elsewhere when a broker fails. In general, when all replicas of a partition fail or go offline, the leader will automatically be set to -1.

这篇关于Kafka如何保证零停机时间和零数据丢失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆