Kafka Stream中幂等与正好一次的区别 [英] Difference between idempotence and exactly-once in Kafka Stream

查看:116
本文介绍了Kafka Stream中幂等与正好一次的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在浏览文档,据我了解,通过启用idempotence=true

I was going through document what I understood we can achieve exactly-once transaction with enabling idempotence=true

幂等:幂等生产者对一个 制片人针对一个主题.基本上每条消息都发送 有石匠人的保证,如果有 错误

idempotence: The Idempotent producer enables exactly once for a producer against a single topic. Basically each single message send has stonger guarantees and will not be duplicated in case there's an error

那么,如果我们已经具有幂等性,那为什么我们要在Kafka Stream中一次只需要另一个属性呢?幂等与完全一次之间究竟有什么区别

So if already we have idempotence then why we need another property exactly-once in Kafka Stream? What exactly different between idempotence vs exactly-once

为什么普通Kafka Producer中无法使用一次精确属性?

Why exactly-once property not available in normal Kafka Producer?

推荐答案

在分布式环境中,故障是很常见的情况,可以随时发生.在Kafka环境中,代理可能会崩溃,网络故障,处理失败,发布消息时失败或无法使用消息等. 这些不同的情况导致了不同类型的数据丢失和重复.

In distributed environment failure is very common scenario which can be happened any time. In Kafka environment, broker can crash, network failure, failure in processing, failure while publishing message or failure to consume messages etc. These different scenario introduced different kind of data loss and duplication.

故障场景

A(确认失败):生产者成功发布了消息,重试次数> 1,但由于失败而未能收到确认.在这种情况下,生产者将重试相同的消息,可能会导致重复消息.

A(Ack Failed): Producer published message successfully with retry>1 but could not received acknowledge due to failure. In that case Producer will retry same message might introduce duplicate.

B(生产者进程在批处理消息中失败):生产者发送了一批失败的消息,但发布的成功很少.在这种情况下,一旦生产者重新启动,它将再次批量重新发布所有消息,这将在Kafka中引入重复消息.

B(Producer process failed in batch messages): Producer sending batch of messages it failed with few published success. In that case and once producer will restart it will again republish all message from batch which will introduce duplicate in Kafka.

C(失火并忘记失败)生产者发布的消息,其中retry = 0(失火并忘记).如果失败,发布的消息将不知道并发送下一条消息,这将导致消息丢失.

C(Fire & Forget Failed) Producer published message with retry=0(fire and forget). In case of failure published will not aware and send next message this will cause message lost.

D(批处理消息中的消费者失败)消费者从Kafka接收到一批消息,并手动提交其偏移量(enable.auto.commit = false).如果消费者在提交给Kafka之前失败,则下次消费者将再次使用相同的记录,并在消费者方面复制副本.

D(Consumer failed in batch message) A consumer receive a batch of messages from Kafka and manually commit their offset (enable.auto.commit=false). If consumer failed before committing to Kafka , next time Consumer will consume the same records again which reproduce duplicate on consumer side.

完全一次语义

在这种情况下,即使生产者尝试重新发送消息,它也会导致 消息将被发布一次,并被消费者完全消费一次.

In this case, even if a producer tries to resend a message, it leads to the message will be published and consume by consumer exactly once.

要在Kafka中实现精确一次语义,它使用以下3个属性

To achieve Exactly-Once semantic in Kafka , it uses below 3 property

  1. enable.idempotence = true(地址a,b和c)
  2. MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = 5(生产者每次连接总是有一个飞行中请求)
  3. isolation.level = read_committed(地址d)

启用幂等(enable.idempotence = true)

等幂传递使生产者能够准确地将消息写入Kafka 一次在主题的生存期内转到主题的特定分区 一个生产者,没有数据丢失和每个分区的订单.

Idempotent delivery enables producer to write message to Kafka exactly once to a particular partition of a topic during the lifetime of a single producer without data loss and order per partition.

"注意,启用幂等性要求MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION小于或等于5,RETRIES_CONFIG大于0且ACKS_CONFIG为'all'.如果用户未明确设置这些值,则将如果设置了不兼容的值,将抛出ConfigException"

为了实现幂等性,Kafka在生成消息时使用唯一的ID(称为产品ID或PID和序列号).生产者在发布的每个消息上保持递增的序列号,这些消息具有唯一的PID.代理始终将当前序列号与前一个序列号进行比较,如果新序列号不比上一个序列号大+1,则它会拒绝,这会避免重复;如果消息中丢失了更多的序列号,则会拒绝同时显示

To achieve idempotence Kafka uses unique id which is called product id or PID and sequence number while producing messages. Producer keep incrementing sequence number on each message published which map with unique PID. Broker always compare current sequence number with previous one and it reject if new one is not +1 greater than previous one which avoid duplication and same time if more than greater show lost in messages

在失败的情况下,代理将序列号与上一个序列进行比较,如果序列不增加,+ 1将拒绝该消息.

In failure scenario broker will compare sequence number with previous one and if sequence not increased +1 will reject the message.

交易(隔离级别)

交易使我们能够自动更新多个主题分区中的数据.事务中包含的所有记录将被成功保存,或者没有保存成功,它允许您在同一事务中提交消费者补偿以及已处理的数据,从而允许端到端的一次精确语义

Transactions give us the ability to atomically update data in multiple topic partitions. All the records included in a transaction will be successfully saved, or none of them will be.It allows you to commit your consumer offsets in the same transaction along with the data you have processed, thereby allowing end-to-end exactly-once semantics.

生产者无需等待将消息写入kafka,其中生产者使用beginTransaction,commitTransaction和abortTransaction(如果失败) 使用者使用隔离级别,即read_committed或read_uncommitted

Producer doesn't wait to write message to kafka wherease Producer uses beginTransaction, commitTransaction and abortTransaction(in case of failure) Consumer uses isolation.level either read_committed or read_uncommitted

  • read_committed:消费者将始终仅读取已提交的数据.
  • read_uncommitted:按偏移顺序读取所有消息,而无需等待 进行交易
  • read_committed: Consumer will always read committed data only.
  • read_uncommitted: Read all messages in offset order without waiting for transactions to be committed

如果具有isolation.level = read_committed的使用者到达了尚未完成的事务的控制消息,它将不会再从该分区传递任何消息,直到生产者提交或中止该事务或发生事务超时.事务超时由生产者使用配置transaction.timeout.ms(默认为1分钟)来确定.

If a consumer with isolation.level=read_committed reaches a control message for a transaction that has not completed, it will not deliver any more messages from this partition until the producer commits or aborts the transaction or a transaction timeout occurs. The transaction timeout is determined by the producer using the configuration transaction.timeout.ms(default 1 minute).

恰好在制作人&消费者

在正常情况下,我们的生产者和消费者是分开的.生产者必须具有幂等性并同时管理事务,以便消费者可以使用isolation.level读取read_committed以使整个过程成为原子操作. 这样可以确保生产者将始终与源系统同步.即使生产者崩溃或事务中止,它也始终是一致的,并且一次将消息或一批消息发布为一个单元.

In normal condition where we have seperate producer and consumer. Producer has to idempotent and same time manage transaction so consumer can use isolation.level to read only read_committed to make whole process as atomic operation. This makes guarantee that producer will always sync with source system. Even producer crash or transaction aborted , it always be consistent and publish message or batch of message as unit once.

同一使用者将一次接收消息或成批消息作为单元.

The same consumer will either receive message or batch of message as unit once.

在精确一次语义生成器中,与消费者一起出现为 原子操作,它将作为一个单元运行.要么发布并 一次就被消耗掉或中止.

In Exactly-Once semantic Producer along with Consumer will appeared as atomic operation which will operate as one unit. Either publish and get consumed once at all or aborted.

在Kafka流中恰好一次

Kafka Stream使用来自主题A的消息,处理该消息并将其发布到主题B,并在发布后使​​用commit(主要在后台运行)将所有状态存储数据刷新到磁盘.

Kafka Stream consume messages from topic A , process and publish message to Topic B and once publish use commit(commit mostly run under cover) to flush all state store data to disk.

Kafka Stream中的一次"是读取-处理-写入"模式,可确保将这些操作视为原子操作.由于Kafka Stream可以满足生产者,消费者和交易的需求,因此Kafka Stream都带有特殊的参数processing.guarantee,它可以完全地_once或at_least_once保证生活,而不用单独处理所有参数.

Exactly-once in Kafka Stream is read-process-write pattern which guarantee that these operation will be treated as atomic operation. Since Kafka Stream cater producer , consumer and transaction all together Kafka Stream comes special parameter processing.guarantee which could exactly_once or at_least_once which make life easy not to handle all parameters separately.

Kafka Streams原子地更新消费者补偿,本地状态存储, 状态存储changelog主题和生产以输出所有主题 一起.如果这些步骤中的任何一个失败,则所有更改都是 回滚.

Kafka Streams atomically updates consumer offsets, local state stores, state store changelog topics and production to output topics all together. If any one of these steps fail, all of the changes are rolled back.

processing.guarantee:确切地,once_once会自动提供以下参数,您无需明确设置

processing.guarantee : exactly_once automatically provide below parameters you no need to set explicetly

  1. isolation.level = read_committed
  2. enable.idempotence = true
  3. MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = 5

这篇关于Kafka Stream中幂等与正好一次的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆