JdbcIO.Write.withResults 和 Wait.on 带有无界 PCollection 和 FixedWindow [英] JdbcIO.Write.withResults and Wait.on with an unbounded PCollection with FixedWindow

查看:17
本文介绍了JdbcIO.Write.withResults 和 Wait.on 带有无界 PCollection 和 FixedWindow的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现一个类似于 这个问题,但与 BEAM 中提到的情况不同-6732,我的源是 Pub/Sub 订阅,而不是使用 Wait.on 写入另一个表,我试图用它来确定写入何时完成,生成消息并路由到 Pub/Sub 主题.

I am trying to implement a pipeline that is similar to the one outlined in this question, but unlike the situation mentioned in BEAM-6732, my source is a Pub/Sub subscription, and instead of using the Wait.on to write to another table, I am trying to use it to determine when the writes are complete, generate a message and route to a Pub/Sub topic.

我尝试使用默认窗口,但根据 Wait.on 的文档,它不适用于无界集合,尝试手动定义固定窗口,允许延迟较低,但是似乎也不起作用,请找到下面使用的窗口.JDBCIO.write 后面的步骤好像一直卡住了,也就是wait 步骤没有输出.

I tried using the default window, but based on the documentation for Wait.on, it does not work for unbounded collections, tried manually defining a fixed window, with a lower allowed lateness, but that also does not seem to work, please find the window used below. The steps after the JDBCIO.write seems to be always stuck, i.e there is no output from the wait step.

Window.into(FixedWindows.of(Duration.standardSeconds(10)))
    .triggering(
        Repeatedly.forever(
            AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1))
                .orFinally(AfterWatermark.pastEndOfWindow())
        )
    ).withAllowedLateness(Duration.standardMinutes(2)).discardingFiredPanes();

寻求有关可能出错的建议,以及对 Pub/Sub 源使用低 allowedLateness 会产生什么影响,这并不能保证排序.

Looking for advise on what could be wrong, also what the impact would be of using a low allowedLateness for a Pub/Sub source, which does not guarantee ordering.

推荐答案

Wait 似乎可以用于无界源,因为它可以应用于 Windows.在 等待 SDK 文档 我们发现:

It seems that Wait can be used for unbounded sources since it can be applied to on Windows. In Wait SDK documentation we find:

"返回内容与输入相同的 PCollection,但延迟产生窗口 W 中的输出元素,直到信号的窗口 W 关闭(信号的水印通过 W.end +信号.allowedLateness)"

"returns a PCollection with contents identical to the input, but delays producing elements of the output in window W until the signal's window W closes (signal's watermark passes W.end + signal.allowedLateness)"

我们来看看解释在Y准备好后将T应用到X";->X.apply(Wait.on(Y)).apply(T).如果我理解正确,我们可以根据您的用例调整以下内容:

Let's look at the explanation "apply T to X after Y is ready" -> X.apply(Wait.on(Y)).apply(T). If I understand correctly, we can adapt the following to your use case:

  • T = 生成消息
  • Y = JdbcIO.Write.withResults
  • X = 发送到 PubSub

虽然您的代码片段中可能需要 FixedWindow,但我认为不需要触发,因为 Wait 已经在考虑窗口的结束(信号的水印通过 W.end + signal.allowedLateness).此外,如果 allowedLateness 值较低,则 Window 将在 Window 结束后尽快关闭.

While FixedWindow might be required in your code snippet, I think triggering won't be needed since Wait is already considering the end of the Window (signal's watermark passes W.end + signal.allowedLateness). Additionally, with a low value for allowedLateness, the Window will be closed sooner, as soon as the Window ends.

这篇关于JdbcIO.Write.withResults 和 Wait.on 带有无界 PCollection 和 FixedWindow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆