Kafka Streams:使用at_least_once可以保证对状态存储的排序吗? [英] Kafka Streams: Any guarantees on ordering of saves to state stores when using at_least_once?

查看:253
本文介绍了Kafka Streams:使用at_least_once可以保证对状态存储的排序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个使用Processor API构建的Kafka Streams Java拓扑.

We have a Kafka Streams Java topology built with the Processor API.

在拓扑中,我们只有一个处理器,可以保存到多个状态存储.

In the topology, we have a single processor, that saves to multiple state stores.

使用at_least_once时,我们期望状态存储之间会出现一些不一致的情况,例如一条传入记录导致对状态存储A和状态B的写入,但是两次保存之间的崩溃仅导致存储A的保存被写入Kafka更改日志主题.

As we use at_least_once, we would expect to see some inconsistencies between the state stores - e.g. an incoming record results in writes to both state store A and B, but a crash between the saves results in only the save to store A getting written to the Kafka change log topic.

  1. 我们是否保证保存的顺序也将成为写入状态存储的顺序?例如.如果我们先保存到存储区A,然后再存储到存储区B,我们当然可以遇到两种更改日志都成功写入的情况,以及仅完成对更改日志A的写入的情况-但是我们也可以最终得到仅完成更改日志B的写操作的情况?

  1. Are we guaranteed that the order in which we save will also be the order in which the writes to the state stores happen? E.g. if we first save to store A and then to store B, we can of course have situation where the write to both change logs succeeded, and a situation where only the write to change log A was completed - but can we also end up in a situation where only the write to change log B was completed?

什么情况会导致重放?当然是崩溃了-但是重新平衡,新的经纪人分区负责人或当我们收到偏移提交失败"错误(请求超时)该怎么办?

What situations will result in replays? A crash of course - but what about rebalances, new broker partition leader, or when we get an "Offset commit failed" error (The request timed out)?

前一段时间,我们尝试使用完全相同的一次,这导致了很多错误消息,这对我们来说是没有意义的.会否为我们提供跨多个状态存储的原子写操作?

A while ago, we tried using exactly_once, which resulted in a lot of error messages, that didn't make sense to us. Would exactly_once give us atomic writes across multiple state stores?

推荐答案

Ad 3.根据

Ad 3. According to The original design document on exactly-once support in Kafka Streams I think with eaxctly_once you get atomic writes across multiple state stores

调用stream.commit()时,将按顺序执行以下步骤:

When stream.commit() is called, the following steps are executed in order:

  1. 刷新本地状态存储(KTable缓存),以确保所有变更日志记录都发送到下游.
  2. 致电producer.sendOffsetsToTransactions(offsets)提交当前记录的消费者在交易中的头寸.请注意,尽管线程的使用者可以在多个任务之间共享,因此可以在多个生产者之间共享,但是任务分配的分区始终是互斥的,因此可以安全地提交该任务分配的分区的偏移量.
  3. 调用producer.commitTransaction()提交当前事务.结果,以上述三元组表示的任务状态是原子提交的.
  4. 再次致电producer.beginTransaction()以开始下一个交易.

这篇关于Kafka Streams:使用at_least_once可以保证对状态存储的排序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆