在AWS Kinesis Analytics SQL中分析滞后的滚动窗口 [英] Analyze a tumbling window with a lag in AWS Kinesis Analytics SQL

查看:115
本文介绍了在AWS Kinesis Analytics SQL中分析滞后的滚动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用例,它似乎应该得到Kinesis Analytics SQL的支持,但是我似乎无法弄清楚.

I've got a use case that seems like it should be supported by Kinesis Analytics SQL, but I can't seem to figure it out.

这是我的情况:

  • 我有一个输入数据流,其中每个事件都有一个event_time 字段和device_id字段.
  • 我想按event_time和device_id汇总数据.这里event_time是作为源数据中的字段提供的,不是将行添加到Kinesis Analytics应用程序的ROWTIME,也不是大概的到达时间.
  • 将数据发送到我的流的过程有一些延迟,因此可能在event_time发生后的3分钟内将行添加到我的流中.
  • I have an input stream of data where each event has an event_time field and a device_id field.
  • I want to aggregate data by event_time and device_id. Here event_time is provided as a field in the source data, it is not the ROWTIME that the row was added to the Kinesis Analytics application, nor the approximate arrival time.
  • The processes that send data to my stream have some delays, so rows may be added to my stream up to 3 minutes after the event_time has occurred.

我的目标是获得一个按event_time和device_id汇总的报告,该报告每个event_time都有一行,并在该行中包含该event_time的所有数据.

My goal is to get a report that summarizes by event_time and device_id that has one row per event_time, and contains all data for that event_time in that one row.

因此,我的数据流可能看起来像:

So, my data stream could look like:

rowtime, event_time, device_id, num_things
12:29:04, 12:27:00, server1, 19
12:30:22, 12:28:00, server1, 33
12:30:23, 12:27:00, server2, 8
12:30:25, 12:29:00, server1, 11
12:31:33, 12:28:00, server2, 2
12:31:44, 12:29:00, server3, 83
12:32:56, 12:29:00, server2, 6

此处的关键点是event_times的数据(例如12:27)需要几分钟的时间,并且可能要比添加到Kinesis Analytics流中的时间早3分钟.

The key point here is that the data for event_times, like 12:27, comes in over a few minute period and can be up to 3 minutes earlier than when those are added to the Kinesis Analytics stream.

我希望我的输出是:

event_time, total_num_things
12:27, 27  <- sums up 19 + 8 for event_time 12:27
12:28, 35 <- sums up 33+2 for event_time 12:28
12:29, 100 <- sums up 11+83+6 for event_time 12:29

这可能吗?

我发现的所有示例在输出中都会有一个ROWTIME的翻滚窗口,因此event_time的聚合可能会在多个ROWTIME分钟存储桶中分解.

All the examples I can find would have a tumbling window of ROWTIME in the output, and thus aggregation of event_time would be potentially broken up across mutiple ROWTIME minute buckets.

推荐答案

LAG现在可用了……也许有帮助.

LAG is now available ... perhaps it helps.

http://docs.aws.amazon .com/kinesisanalytics/latest/sqlref/sql-reference-lag.html

这篇关于在AWS Kinesis Analytics SQL中分析滞后的滚动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆