Apache Beam窗口:考虑较晚的数据,但仅发出一个窗格 [英] Apache beam windowing: consider late data but emit only one pane
问题描述
我想在水印到达窗口末尾x分钟后发出一个窗格.这让我确保可以处理一些较晚的数据,但仍只发出一个窗格.我目前正在使用Java.
I would like to emit a single pane when the watermark reaches x minutes past the end of the window. This let's me ensure I handle some late data, but still only emit one pane. I am currently working in java.
目前,我找不到适合该问题的解决方案.当水印到达窗口的末端时,我可以发出一个窗格,但是所有后期数据都将被丢弃.我可以在窗口末尾发出窗格,然后在收到较晚的数据时再次发出,但是在这种情况下,我不会发出单个窗格.
At the moment I can't find proper solutions to this problem. I could emit a single pane when the watermark reaches the end of the window, but then any late data is dropped. I could emit the pane at the end of the window and then again when I receive late data, however in this case I am not emitting a single pane.
我目前有与此类似的代码:
I currently have code similar to this:
.triggering(
// This is going to emit the pane, but I don't want emit the pane yet!
AfterWatermark.pastEndOfWindow()
// This is going to emit panes each time I receive late data, however
// I would like to only emit one pane at the end of the allowedLateness
).withAllowedLateness(allowedLateness).accumulatingFiredPanes())
万一仍然令人困惑,我只想在水印通过allowedLateness
时只发出一个窗格.
In case there is still confusion, I would like to only emit a single pane when the watermark passes the allowedLateness
.
推荐答案
Thanks Guillem, in the end I used your answer to find this very useful link with lots of apache beam examples. From this I came up with the following solution:
// We first specify to never emit any panes
.triggering(Never.ever())
// We then specify to fire always when closing the window. This will emit a
// single final pane at the end of allowedLateness
.withAllowedLateness(allowedLateness, Window.ClosingBehavior.FIRE_ALWAYS)
.discardingFiredPanes())
这篇关于Apache Beam窗口:考虑较晚的数据,但仅发出一个窗格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!