Kafka:隔离级别影响 [英] Kafka: isolation level implications

查看:63
本文介绍了Kafka:隔离级别影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用例,我需要 100% 的可靠性、幂等性(无重复消息)以及我的 Kafka 分区中的顺序保留.我正在尝试使用事务 API 来设置概念证明来实现这一点.有一个名为isolation.level"的设置,我很难理解.

I have a use case where I need 100% reliability, idempotency (no duplicate messages) as well as order-preservation in my Kafka partitions. I'm trying to set up a proof of concept using the transactional API to achieve this. There is a setting called 'isolation.level' that I'm struggling to understand.

在这篇文章中,他们讨论了两个选项之间的区别

In this article, they talk about the difference between the two options

Kafka 消费者现在有两个新的隔离级别:

There are now two new isolation levels in Kafka consumer:

read_committed:读取不属于事务,即在事务提交之后.Read_committed 消费者使用分区的结束偏移量,而不是客户端缓冲.这个偏移量是第一个消息属于一个开放事务的分区.它也被称为最后稳定偏移"(LSO).read_committed 消费者只会读取直到LSO 并过滤掉任何已发送的交易消息中止.

read_committed: Read both kind of messages that are not part of a transaction and that are, after the transaction is committed. Read_committed consumer uses end offset of a partition, instead of client-side buffering. This offset is the first message in the partition belonging to an open transaction. It is also known as "Last Stable Offset" (LSO). A read_committed consumer will only read up till the LSO and filter out any transactional messages which have been aborted.

read_uncommitted:按偏移顺序读取所有消息,不等待提交事务.此选项类似于Kafka 消费者的当前语义.

read_uncommitted: Read all messages in offset order without waiting for transactions to be committed. This option is similar to the current semantics of a Kafka consumer.

这里的性能影响是显而易见的,但老实说,我正在努力阅读字里行间并理解每个选择的功能影响/风险.read_committed 似乎更安全",但我想了解原因.

The performance implication here is obvious but I'm honestly struggling to read between the lines and understand the functional implications/risk of each choice. It seems like read_committed is 'safer' but I want to understand why.

推荐答案

首先,isolation.level 设置仅对消费者产生影响,前提是它消费的主题包含使用事务性写入的记录制作人.

First, the isolation.level setting only has an impact on the consumer if the topics it's consuming from contains records written using a transactional producer.

如果是这样,如果它设置为 read_uncommitted,消费者将简单地读取包括中止交易在内的所有内容.这是默认设置.

If so, if it's set to read_uncommitted, the consumer will simply read everything including aborted transactions. That is the default.

当设置为 read_committed 时,消费者将只能从已提交的事务中读取记录(除了不属于事务的记录).这也意味着为了保持排序,如果交易正在进行中,消费者将无法消费作为该交易一部分的记录.基本上,代理只允许消费者读取最后一个稳定偏移量 (LSO).当事务提交(或中止)时,代理将更新 LSO,消费者将收到新记录.

When set to read_committed, the consumer will only be able to read records from committed transactions (in addition to records not part of transactions). It also means that in order to keep ordering, if a transaction is in-flight the consumer will not be able to consume records that are part of that transation. Basically the broker will only allow the consumer to read up to the Last Stable Offset (LSO). When the transation is committed (or aborted), the broker will update the LSO and the consumer will receive the new records.

如果您不能容忍中止事务的重复或记录,那么您应该使用read_committed.正如您所暗示的那样,这会在消费中产生一个小的延迟,因为只有在提交事务后记录才可见.影响主要取决于您的交易规模,即您提交的频率.

If you don't tolerate duplicates or records from aborted transactions, then you should use read_committed. As you hinted this creates a small delay in consuming as records are only visible once transactions are committed. The impact mostly depends on the sizes of your transactions, ie how often you commit.

这篇关于Kafka:隔离级别影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆