为什么我的 Kafka Streams 拓扑不能正确重放/重新处理? [英] Why does my Kafka Streams topology does not replay/reprocess correctly?

查看:24
本文介绍了为什么我的 Kafka Streams 拓扑不能正确重放/重新处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的拓扑:

I have a topology that looks like this:

KTable<ByteString, User> users = topology.table(USERS);

KStream<ByteString, JoinRequest> joinRequests = topology.stream(JOIN_REQUESTS)
    .mapValues(entityTopologyProcessor::userNew)
    .to(USERS);

topology.stream(SETTINGS_CONFIRM_REQUESTS)
    .join(users, entityTopologyProcessor::userSettingsConfirm)
    .to(USERS);

topology.stream(SETTINGS_UPDATE_REQUESTS)
    .join(users, entityTopologyProcessor::userSettingsUpdate)
    .to(USERS);

在运行时,此拓扑工作正常.用户是通过加入请求创建的.他们通过设置确认请求确认他们的设置.他们通过设置更新请求更新设置.

At runtime this topology works fine. Users are created with join requests. They confirm their settings with settings confirm requests. They update their settings with settings update requests.

但是,重新处理此拓扑并不会产生原始结果.具体来说,设置更新加入者看不到由设置确认加入者产生的用户,即使在时间戳方面,从用户创建时间到用户确认时间到用户更新时间过去了很多秒他们的设置.

However, reprocessing this topology does not produce the original results. Specifically, the settings update joiner does not see the user that resulted from the settings confirm joiner, even though in terms of timestamps, many seconds elapse from the time the user is created, to the time the user is confirmed to the time the user updates their settings.

我不知所措.我试过关闭用户表上的缓存/登录.不知道该怎么做才能正确地进行重新处理.

I'm at a loss. I've tried turning off caching/logging on the user table. No idea what to do to make this reprocess properly.

推荐答案

KStream-KTable 连接不是 100% 确定性的(并且可能永远不会成为 100% 确定性).我们知道问题并讨论解决方案,至少可以缓解问题.

A KStream-KTable join is not 100% deterministic (and might never become 100% deterministic). We are aware of the problem and discuss solutions, to at least mitigate the issue.

一个问题是,如果消费者从代理获取数据,我们无法轻松控制代理返回数据的主题和/或分区.根据我们从代理接收数据的顺序,结果可能略有不同.

One problem is, that if a Consumer fetches from the brokers, we cannot control easily for which topics and/or partitions the broker returns data. And depending on the order in which we receive data from the broker, the result might slightly differ.

一个相关问题:https://issues.apache.org/jira/browse/KAFKA-3514

这篇博文也可能有帮助:https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/

This blog post might help, too: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/

这篇关于为什么我的 Kafka Streams 拓扑不能正确重放/重新处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆