了解 Kafka Streams 中处理器实现中的事务 [英] Understanding transaction in Processor implementation in Kafka Streams

查看:23
本文介绍了了解 Kafka Streams 中处理器实现中的事务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用 Kafka Streams 的 Processor API 时,我使用了这样的东西:

While using Processor API of Kafka Streams, I use something like this:

context.forward(key,value)
context.commit()

实际上,我在这里所做的是每分钟将状态从状态存储发送到接收器(在 init() 方法中使用 context.schedule()).我在这里不明白的是:

[Key,Value] 对我向前发送然后执行 commit() 取自 state store.它是根据我的特定逻辑从 许多 非顺序 输入 [键,值] 对聚合而成的.每个这样的输出 [key,value] 对都是来自输入(kafka 主题)的少数未排序的聚合 [key,value] 对.所以,我不明白 Kafka 集群和 Kafka Streams lib 如何知道原始输入 [key,value] 对与发送的最终输出 [key,value] 之间的相关性.如果 Kafka 不知道输入对和输出对之间的连接,它如何被事务包装(故障安全).当我执行 context.commit() 时实际提交了什么?

谢谢!

Actually, what I'm doing here is sending forward a state from state store to sink every minute (using context.schedule() in init() method). What I don't understand here is:

[Key,Value] pair I'm sending forward and then doing commit() is taken from state store. It is aggregated according to my specific logic from many not sequential input [key,value] pairs. Each such output [key,value] pair is aggregation of few not ordered [key,value] pairs from input (kafka topic). So, I don't understand how Kafka cluster and Kafka Streams lib can know the correlation between the original input [key,value] pairs and the eventual output [key,value] that is being sent out. How it can be wrapped by transaction (fail-safe), if Kafka doesn't know the connection between input pairs and output pair. And what is actually being committed when I do context.commit()?

Thanks!

推荐答案

详细解释这一切超出了我在这里写的答案.

To explain all this in details goes beyond what I can write here in an answer.

如果提交事务,基本上当前输入的主题偏移量和所有对 Kafka 主题的写入都是以原子方式完成的.这意味着,所有挂起的写入都在提交完成之前被刷新.

Basically the current input topic offsets and all writes to Kafka topics are done atomically if a transaction is committed. This implies, that all pending writes are flushed before the commit is done.

事务不需要了解您的实际业务逻辑.他们只是将输入主题的进度跟踪与输出主题的写入同步".

Transactions don't need to know about your actual business logic. They just "synchronize" the progress tracking on the input topics with writes to output topics.

我建议阅读相应的博客文章并观看有关 Kafka 中的恰好一次的讨论以获取更多详细信息:

I would recommend to read corresponding blog posts and watch talks about exactly-once in Kafka to get more details:

顺便说一句:这是一个关于 Streams API 中手动提交的问题.您应该考虑一下:如何使用 Kafka Stream 手动提交?

Btw: This is a question about manual commits in Streams API. You should consider this: How to commit manually with Kafka Stream?

这篇关于了解 Kafka Streams 中处理器实现中的事务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆