为什么我的Kafka Streams拓扑无法正确重播/重新处理? [英] Why does my Kafka Streams topology does not replay/reprocess correctly?

查看:83
本文介绍了为什么我的Kafka Streams拓扑无法正确重播/重新处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的拓扑:

I have a topology that looks like this:

KTable<ByteString, User> users = topology.table(USERS);

KStream<ByteString, JoinRequest> joinRequests = topology.stream(JOIN_REQUESTS)
    .mapValues(entityTopologyProcessor::userNew)
    .to(USERS);

topology.stream(SETTINGS_CONFIRM_REQUESTS)
    .join(users, entityTopologyProcessor::userSettingsConfirm)
    .to(USERS);

topology.stream(SETTINGS_UPDATE_REQUESTS)
    .join(users, entityTopologyProcessor::userSettingsUpdate)
    .to(USERS);

在运行时,此拓扑可以正常工作.使用加入请求创建用户.他们通过设置确认请求来确认其​​设置.他们使用设置更新请求来更新其设置.

At runtime this topology works fine. Users are created with join requests. They confirm their settings with settings confirm requests. They update their settings with settings update requests.

但是,重新处理此拓扑不会产生原始结果.具体地说,即使在时间戳方面,从创建用户到确认用户再到用户更新的时间都经过了几秒钟,设置更新连接程序仍不会看到由设置确认连接程序导致的用户.他们的设置.

However, reprocessing this topology does not produce the original results. Specifically, the settings update joiner does not see the user that resulted from the settings confirm joiner, even though in terms of timestamps, many seconds elapse from the time the user is created, to the time the user is confirmed to the time the user updates their settings.

我很茫然.我尝试关闭用户表上的缓存/日志记录.不知道该怎么做才能正确地重新处理.

I'm at a loss. I've tried turning off caching/logging on the user table. No idea what to do to make this reprocess properly.

推荐答案

KStream-KTable连接不是100%确定性的(并且可能永远不会成为100%确定性的).我们已经意识到了问题所在,并讨论了解决方案,以至少缓解该问题.

A KStream-KTable join is not 100% deterministic (and might never become 100% deterministic). We are aware of the problem and discuss solutions, to at least mitigate the issue.

一个问题是,如果消费者从经纪人那里获取信息,我们将无法轻松控制经纪人返回数据的主题和/或分区.而且,根据我们从经纪人接收数据的顺序,结果可能会略有不同.

One problem is, that if a Consumer fetches from the brokers, we cannot control easily for which topics and/or partitions the broker returns data. And depending on the order in which we receive data from the broker, the result might slightly differ.

一个相关问题: https://issues.apache.org/jira/browse/KAFKA-3514

此博客文章也可能有帮助: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/

This blog post might help, too: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/

这篇关于为什么我的Kafka Streams拓扑无法正确重播/重新处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆