加入“中学"课程的惯用方式按键 [英] Idiomatic way to join on "secondary" keys
问题描述
如果我们有一个像这样的流
If we have a stream that looks like this
Person {
…
OrganizationID
}
我们想加入另一个流
Organization {
ID
…
}
创建类似这样的复合记录:
to create a composite record like so:
Person {
…
Organization {
ID
…
}
}
在 Apache Beam编程模型中,惯用和最有效的方法是什么?
What is the most idiomatic and efficient way to do so in the Apache Beam programming model?
注意:推荐使用side input
作为解决类似问题的方法,但由于此处的作用是每次更改 到Person
或 Organization
应该会产生一个新的增强的Person
记录.
NB: have seen side input
s recommended as a solution to similar problems like this, but it is not applicable here since the effect we are after is that every change to either Person
or Organization
should yield a new augmented Person
-record.
推荐答案
答案是,由于缺少Apache Beam实现中的撤消功能,Apache Beam不支持您的示例.
The answer is, your example is not supported by Apache Beam due to missing retraction in Apache Beam implementation.
================================================ ====
===================================================
原始答案:
您可能要检查Apache Beam中的Join库[1].
You might want to check Join library[1] in Apache Beam.
加入Beam模型需要额外考虑流中的窗口化策略.听起来您的流不需要开窗,所以说您的流都在全局窗口中.但是,如果您在两个流上都设置了全局窗口,则使用默认触发器并像Beam的Join库一样进行Join,由于水印永远不会越过无休止的窗口,因此Join将不会发出任何结果.但是,如果您重复设置数据驱动的触发器(一旦看到足够多的元素,便会触发),但是由于缺少对Beam撤回的支持,尚不清楚如何为Join优化预发布的结果.
Join in Beam model needs extra thinking on windowing strategies on your streams. Sounds like your streams does not require windowing, so say your streams are both in global window. But if you set global window on both of your streams, use default trigger and do Join like Beam's Join library, due to watermark never passes endless window, your Join will not emit any result. If you set repeatly data driven trigger (fire once seen enough elements), however, due to missing supporting for retraction in Beam, it's not clear how pre-emited result is refined for Join.
这篇关于加入“中学"课程的惯用方式按键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!