Kafka Streams:使用 at_least_once 时对状态存储的保存顺序有任何保证吗? [英] Kafka Streams: Any guarantees on ordering of saves to state stores when using at_least_once?

查看:23
本文介绍了Kafka Streams:使用 at_least_once 时对状态存储的保存顺序有任何保证吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个使用处理器 API 构建的 Kafka Streams Java 拓扑.

We have a Kafka Streams Java topology built with the Processor API.

在拓扑中,我们有一个处理器,可以保存到多个状态存储.

In the topology, we have a single processor, that saves to multiple state stores.

当我们使用 at_least_once 时,我们希望看到状态存储之间存在一些不一致 - 例如传入的记录会导致写入状态存储 A 和 B,但两次保存之间的崩溃只会导致存储 A 的保存写入 Kafka 更改日志主题.

As we use at_least_once, we would expect to see some inconsistencies between the state stores - e.g. an incoming record results in writes to both state store A and B, but a crash between the saves results in only the save to store A getting written to the Kafka change log topic.

  1. 我们是否保证我们保存的顺序也将是写入状态存储的顺序?例如.如果我们先保存到存储 A 然后再保存到 B,我们当然会出现写入两个变更日志都成功的情况,以及只完成写入变更日志 A 的情况 - 但我们也可以最终仅写入更改日志 B 的情况?

  1. Are we guaranteed that the order in which we save will also be the order in which the writes to the state stores happen? E.g. if we first save to store A and then to store B, we can of course have situation where the write to both change logs succeeded, and a situation where only the write to change log A was completed - but can we also end up in a situation where only the write to change log B was completed?

哪些情况会导致重播?当然是崩溃了 - 但是重新平衡,新的代理分区领导者,或者当我们收到偏移提交失败"错误(请求超时)时呢?

What situations will result in replays? A crash of course - but what about rebalances, new broker partition leader, or when we get an "Offset commit failed" error (The request timed out)?

不久前,我们尝试使用exactly_once,这导致了很多错误消息,这对我们来说没有意义.Exactly_once 会为我们提供跨多个状态存储的原子写入吗?

A while ago, we tried using exactly_once, which resulted in a lot of error messages, that didn't make sense to us. Would exactly_once give us atomic writes across multiple state stores?

推荐答案

Ad 3. 根据 有关 Kafka Streams 中的一次性支持的原始设计文档 我认为使用 eaxctly_once 您可以跨多个状态存储进行原子写入

Ad 3. According to The original design document on exactly-once support in Kafka Streams I think with eaxctly_once you get atomic writes across multiple state stores

调用stream.commit()时,依次执行以下步骤:

When stream.commit() is called, the following steps are executed in order:

  1. 刷新本地状态存储(KTable 缓存)以确保所有更改日志记录都发送到下游.
  2. 调用 producer.sendOffsetsToTransactions(offsets) 以提交当前记录的消费者在交易中的位置.请注意,尽管线程的使用者可以在多个任务之间共享,因此多个生产者,但任务分配的分区始终是独占的,因此只提交此任务分配分区的偏移量是安全的.
  3. 调用 producer.commitTransaction() 提交当前事务.因此,表示为上述三元组的任务状态是原子提交的.
  4. 再次调用 producer.beginTransaction() 以开始下一个事务.

这篇关于Kafka Streams:使用 at_least_once 时对状态存储的保存顺序有任何保证吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆