Apache Flink:如何应用多个计数窗口功能? [英] Apache Flink: How to apply multiple counting window functions?

查看:92
本文介绍了Apache Flink:如何应用多个计数窗口功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组经过加密的数据流,需要计算不同时间段(1分钟,5分钟,1天,1周)的翻转计数.

I have a stream of data that is keyed and need to compute counts for tumbled of different time periods (1 minute, 5 minutes, 1 day, 1 week).

是否可以在单个应用程序中计算所有四个窗口计数?

Is it possible to compute all four window counts in a single application?

推荐答案

是的,这是可能的.

如果使用事件时间,则可以简单地以增加的时间间隔级联窗口.因此,您这样做:

If you are using event-time, you can simply cascade the windows with increasing time intervals. So you do:

DataStream<String> data = ...
// append a Long 1 to each record to count it.
DataStream<Tuple2<String, Long>> withOnes = data.map(new AppendOne); 

DataStream<Tuple2<String, Long>> 1minCnts = withOnes
  // key by String field
  .keyBy(0) 
  // define time window
  .timeWindow(Time.of(1, MINUTES))
  // sum ones of the Long field
  // in practice you want to use an incrementally aggregating ReduceFunction and 
  // a WindowFunction to extract the start/end timestamp of the window
  .sum(1);

// emit 1-min counts to wherever you need it
1minCnts.addSink(new YourSink());

// compute 5-min counts based on 1-min counts
DataStream<Tuple2<String, Long>> 5minCnts = 1minCnts
  // key by String field
  .keyBy(0)
  // define time window of 5 minutes
  .timeWindow(Time.of(5, MINUTES))
  // sum the 1-minute counts in the Long field
  .sum(1);

// emit 5-min counts to wherever you need it
5minCnts.addSink(new YourSink());

// continue with 1 day window and 1 week window

请注意,这是可能的,因为:

Note that this is possible, because:

  1. 总和是一个关联函数(您可以通过对部分和求和来计算总和).
  2. 翻滚窗口对齐良好,不会重叠.

关于对逐渐汇总的ReduceFunction的评论:

Regarding the comment on the incrementally aggregating ReduceFunction:

通常,您希望在窗口操作的输出中包含窗口的开始和/或结束时间戳(否则,同一键的所有结果看起来都一样).可以从WindowFunctionapply()方法的window参数访问窗口的开始和结束时间.但是,WindowFunction不会增量聚合记录,而是收集它们并在窗口末尾聚合记录.因此,使用ReduceFunction进行增量聚合和使用WindowFunction将窗口的开始和/或结束时间附加到结果上会更有效. 文档讨论细节.

Usually, you want to have the start and/or end timestamp of the window in the output of a window operation (otherwise all results for the same key look the same). The start and end time of a window can be accessed from the window parameter of the apply() method of a WindowFunction. However, a WindowFunction does not incrementally aggregate records but collects them and aggregates the records at the end of the window. Hence, it is more efficient to use a ReduceFunction for incremental aggregation and a WindowFunction to append the start and/or end time of the window to the result. The documentation discusses the details.

如果要使用处理时间进行计算,则不能级联窗口,而必须将输入数据流扇出成四个窗口函数.

If you want to compute this using processing time, you cannot cascade the windows but have to fan out from the input data stream to four window functions.

这篇关于Apache Flink:如何应用多个计数窗口功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆