GroupByKey 转换的早期结果 [英] Early results from GroupByKey transform

查看:22
本文介绍了GroupByKey 转换的早期结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能让 GroupByKey 触发早期结果,而不是等待所有数据到达(在我的情况下是很长的时间).我试图将我的输入 PCollection 拆分为具有早期触发器的窗口,但它只是不起作用.在给出结果之前,它仍然等待所有数据到达.

How can I get GroupByKey to trigger early results, rather than wait for all the data to arrive (which in my case is a pretty long time).I tried to split my input PCollection into windows with an early trigger, but it just doesn`t work. It still waits for all the data to arrive before giving out the results.

PCollection<List<String>> input = ...
PCollection<KV<Integer,List<String>>> keyedInput = input.apply(ParDo.of(new AddArbitraryKey()))
keyedInput.apply(Window<KV<Integer,List<String>>>into(
          FixedWindows.of(Duration.standardSeconds(1)))
         .triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
         .withAllowedLateness(Duration.ZERO).discardingFiredPanes())
 .apply(GroupByKey.<Integer,List<String>>create())
       .apply(ParDo.of(new RemoveArbitraryKey()))
       .apply(ParDo.of(new FurtherProcessing())

我这样做是为了防止融合.AddArbitraryKey 转换使用时间戳输出其元素.但是, GroupByKey 会保留所有内容,直到所有数据到达(对于所有窗口).有人可以告诉我如何让它尽早触发.谢谢你 .

I am doing this to prevent fusing . The AddArbitraryKey transform outputs its elements with Timestamp. However, GroupByKey holds up everything until all the data arrives (for all the windows) . Could someone please tell me how i can get it to trigger early. Thank You .

推荐答案

你可以安装像

Repeatedly
  .forever(AfterProcessingTime
    .pastFirstElementInPane()
    .plusDuration(Duration.standardMinutes(1))
  .orFinally(AfterWatermark.pastEndOfWindow())
  .discardingFiredPanes()

AfterWatermark.pastEndOfWindow()
  .withEarlyFirings(
    AfterProcessingTime
      .pastFirstElementInPane()
      .plusDuration(Duration.standardMinutes(1))

这篇关于GroupByKey 转换的早期结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆