如何检测KTable联接的哪一侧触发了更新? [英] How to detect which side of a KTable join has triggered an update?

查看:67
本文介绍了如何检测KTable联接的哪一侧触发了更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Kafka中加入两个KTable时,每次更新两个KTable中的一个时,您的输出Ktable也将被更新.

When you are joining two KTables in Kafka, every time one of the two KTables gets updated, your output Ktable gets updated as well.

想象一下,您正在将CustomersOrders的列表结合在一起,并对其进行了适当的简化.再次想象一下,您使用此联接的结果来为最终客户提供特别优惠和建议:

Imagine you are joining Customers with a list of Orders which you have reduced appropriately. Imagine again you consume the result of this join to produce special offers and proposal for the end customer:

  • 您可能想向他发送特别优惠,因为他已经更改了地址,并且他现在在您要销售产品XYZ的区域中
  • 您可能想向他发送特别优惠,因为他的总订单金额超过1000美元.

为了实现这一点,您需要每次联接在流上发出"新记录时都知道,联接的哪一侧确定了该新记录.处理该用例的合适解决方案是什么?

In order to implement this, you would need to know every time the join "emits" a new record on the stream, which side of the join determined this new record. What is the appropriate solution to handle this use case?

推荐答案

我认为有两种方法可以做到这一点:

I think there are two ways to do this:

  1. 在联接之后使用连续的.transform()将当前联接结果存储在存储器中.如果收到更新,则可以将新结果与旧结果进行比较,从而确定客户数据或订单数据是否已更改.不过,这是一个占用大量内存的解决方案.
  2. 在联接之前(对于每个输入)使用transform(),并使用时间戳或偏移量信息来扩充记录.联接应保留此信息.因此,结果是,与客户相比,订单的冲销/时间戳更大,这告诉您订单已更新并触发了此结果.此解决方案占用的内存较少,但是取决于您的输入数据,它可能不是100%准确的(使用偏移量可能根本不起作用,并且时间戳记也可能是模糊的,具体取决于数据更新的频率).
  1. Use a consecutive .transform() after the join that store the current join result in a store. If you receive an update, you can compare the new result with the old result and thus determine if the customer data or order data changed. This is a memory intensive solution though.
  2. Use a transform() before the join (for each input), and augment your records with timestamp or offset information. The join should preserve this information. Thus, in the result, a larger offset/timestamp of the order compared to the customer tells you that the order was updated and triggered this result. This solution is less memory intensive, but might not be 100% exact, depending on your input data (using offsets might not work at all, and timestamps could also be fuzzy depending on the frequency of the updates to your data).

这篇关于如何检测KTable联接的哪一侧触发了更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆