Apache Beam:固定窗口触发器 [英] Apache Beam: Trigger for Fixed Window

查看:76
本文介绍了Apache Beam:固定窗口触发器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 文档之后,如果您未明确指定触发器,则会得到如下所述的行为:

According to following documentation, it is stated that if you don't explicitly specify a trigger you get behavior described below:

如果未指定,则默认行为是在 水印通过窗口的尽头,然后每次再次触发 时间是迟到的数据.

If unspecified, the default behavior is to trigger first when the watermark passes the end of the window, and then trigger again every time there is late arriving data.

此行为是否也适用于FixedWindow?例如,您假定固定窗口应具有默认触发条件,该默认触发条件是在水印通过窗口结尾后重复触发,并丢弃所有最新数据,除非明确处理了最新数据.另外,在源代码的哪里可以看到示例FixedWindow对象的触发器的定义?

Is this behavior true for FixedWindow as well? For example you would assume fixed window should have a default trigger of repeatedly firing after watermark passes end of window, and discard all late data unless late data is explicitly handled. Also where in the source code can I see definition of trigger for, example FixedWindow object?

推荐答案

开始的最佳文档是触发器窗口(并点击链接).特别是,它说,即使每次延迟数据到达都会触发默认触发器,但在默认配置中,它仍然仅触发一次,丢弃延迟数据:

The best doc to start with is the guide for triggers, and windows (and following the links from there). In particular, it says that, even though the default trigger fires every time late data arrives, in default configuration it still effectively only triggers once, discarding the late data:

如果同时使用默认窗口配置和 默认触发器,默认触发器仅发出一次,而后期数据 被丢弃.这是因为默认的窗口配置具有 允许的延迟值为0.有关详细信息,请参见处理延迟数据"部分. 有关修改此行为的信息.

if you are using both the default windowing configuration and the default trigger, the default trigger emits exactly once, and late data is discarded. This is because the default windowing configuration has an allowed lateness value of 0. See the Handling Late Data section for information about modifying this behavior.

详细信息

Beam中的窗口化概念通常包括几件事情,包括分配窗口,处理触发器,处理最新数据和其他一些东西.但是,这些东西是单独分配和处理的.从这里开始,它很快变得令人困惑.

Windowing concept in Beam in general encompasses few things, including assigning windows, handling triggers, handling late data and few other things. However these things are assigned and handled separately. It gets confusing quickly from here.

如何将元素分配给窗口由WindowFn

How the elements are assigned to a window is handled by a WindowFn, see here. For example FixedWindows: link. It is basically the only thing that happens there (almost). Assigning a window is a special case of grouping the elements based on the event timestamps (kinda). You can think of the logic being similar to manually assigning custom keys to elements based on the timestamps, and then applying GroupByKey.

触发是一个相关但独立的概念.触发器只是(大致)谓词,以指示到目前为止何时允许运行程序发出窗口中累积的数据( https://s.apache.org/beam-triggers

Triggering is a related but separate concept. Triggers are (roughly) just predicates to indicate when the runner is allowed to emit the data accumulated in the window so far (source). I think this is the closest thing to the original design doc for triggers: https://s.apache.org/beam-triggers

延迟是配置的另一个相关部分,该部分也有些独立(链接).即使触发器可能允许跑步者永远发出所有最新数据,也可以将管道设置为不允许任何最新数据(这是默认行为),或者仅允许在有限的时间内发布最新数据.这将导致上述默认触发行为.是的,这令人困惑.如果可能的话,请避免使用任何复杂的触发和延迟,否则可能无法按预期工作.

Lateness is another related part of the configuration which is also somewhat separate (link). Even though a trigger might allow the runner to emit all the late data forever, the pipeline can be set to not allow any late data (which is the default behavior), or only allow late data for some limited time. This leads to the default trigger behavior described above. Yes, this is confusing. Avoid using any complex triggering and lateness if you can, it likely won't work as you expect it to.

因此,窗口类仅处理分组逻辑,即哪种元素具有相同的分组键.这些类不在乎您何时要发出累积的结果.这取决于您的业务逻辑,例如您可能要处理新到达的元素,或者可能要丢弃它们,它不是窗口的一部分.这意味着FixedWindows或其他窗口没有特殊的触发器,您可以在任何窗口中使用任何触发器(即使在逻辑上某些特定的触发器在某些窗口的上下文中没有意义).

So the window classes only handle the grouping logic, i.e. what kind of elements have the same grouping key. These classes don't care about when you will want to emit the accumulated results. This depends on your business logic, e.g. you might want to handle newly arrived elements, or you might want to discard them, it's not part of the window. This means there's no special triggers for FixedWindows or other windows, you can use any trigger with any window (even if logically some specific trigger doesn't make sense in context of some window).

Default trigger is just that, something that is just set by default. You should assign your own trigger if it doesn't suit your needs. And it likely won't, except for some basic use cases.

更新

An example of how to use FixedWindows with triggers.

这篇关于Apache Beam:固定窗口触发器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆