如何统计给定时间窗口内Apache Flink处理的记录数 [英] How to count the number of records processed by Apache Flink in a given time window

查看:26
本文介绍了如何统计给定时间窗口内Apache Flink处理的记录数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在flink中定义一个时间窗口后如下:

After defining a time window in flink as follows:

val lines = socket.timeWindowAll(Time.seconds(5))

如何计算特定 5 秒窗口中的记录数?

How can I compute the number of records in that particular window of 5 seconds?

推荐答案

执行计数聚合的最有效方法是 ReduceFunction.但是,reduce 有输入和输出类型必须相同的限制.因此,您必须在应用窗口之前将输入转换为 Int:

The most efficient way to perform a count aggregation is a ReduceFunction. However, reduce has the restriction that input and output type must be identical. So you would have to convert the input to an Int before applying the window:

val socket: DataStream[(String)] = ???

val cnts: DataStream[Int] = socket
  .map(_ => 1)                    // convert to 1
  .timeWindowAll(Time.seconds(5)) // group into 5 second windows
  .reduce( (x, y) => x + y)       // sum 1s to count

这篇关于如何统计给定时间窗口内Apache Flink处理的记录数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆