如何检测 KTable 连接的哪一侧触发了更新? [英] How to detect which side of a KTable join has triggered an update?

查看:28
本文介绍了如何检测 KTable 连接的哪一侧触发了更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您在 Kafka 中加入两个 KTable 时,每当两个 KTable 之一更新时,您的输出 Ktable 也会更新.

When you are joining two KTables in Kafka, every time one of the two KTables gets updated, your output Ktable gets updated as well.

想象一下,您正在加入 Customers,其中包含一个您已适当减少的 Orders 列表.再次想象一下,您使用此连接的结果为最终客户提供特别优惠和建议:

Imagine you are joining Customers with a list of Orders which you have reduced appropriately. Imagine again you consume the result of this join to produce special offers and proposal for the end customer:

  • 您可能想向他发送特别优惠,因为他已更改地址并且他现在位于您销售产品 XYZ 的区域
  • 您可能想向他发送特别优惠,因为他的订单总额超过 1000 美元.

为了实现这一点,您需要知道每次连接在流上发出"一条新记录时,连接的哪一侧确定了这条新记录.处理此用例的适当解决方案是什么?

In order to implement this, you would need to know every time the join "emits" a new record on the stream, which side of the join determined this new record. What is the appropriate solution to handle this use case?

推荐答案

我认为有两种方法可以做到:

I think there are two ways to do this:

  1. 在连接之后使用连续的 .transform() 将当前连接结果存储在一个存储中.如果您收到更新,您可以将新结果与旧结果进行比较,从而确定客户数据或订单数据是否发生变化.不过,这是一个内存密集型解决方案.
  2. 在连接之前使用 transform()(对于每个输入),并使用时间戳或偏移信息扩充您的记录.联接应保留此信息.因此,在结果中,与客户相比更大的订单偏移/时间戳告诉您订单已更新并触发了此结果.此解决方案占用的内存较少,但可能不是 100% 准确,具体取决于您的输入数据(使用偏移量可能根本不起作用,时间戳也可能模糊,具体取决于数据更新的频率).
  1. Use a consecutive .transform() after the join that store the current join result in a store. If you receive an update, you can compare the new result with the old result and thus determine if the customer data or order data changed. This is a memory intensive solution though.
  2. Use a transform() before the join (for each input), and augment your records with timestamp or offset information. The join should preserve this information. Thus, in the result, a larger offset/timestamp of the order compared to the customer tells you that the order was updated and triggered this result. This solution is less memory intensive, but might not be 100% exact, depending on your input data (using offsets might not work at all, and timestamps could also be fuzzy depending on the frequency of the updates to your data).

这篇关于如何检测 KTable 连接的哪一侧触发了更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆