使用 Kinesis Analytics 构建实时会话 [英] Using Kinesis Analytics to construct real time sessions

查看:17
本文介绍了使用 Kinesis Analytics 构建实时会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

某处是否有示例,或者有人可以解释如何使用 Kinesis Analytics 构建实时会话.(即会话)

Is there an example somewhere or can someone explain how to using Kinesis Analytics to construct real time sessions. (ie sessionization)

这里提到这可能:https://aws.amazon.com/blogs/aws/amazon-kinesis-analytics-process-streaming-data-in-real-time-with-sql/在自定义窗口的讨论中但没有给出示例.

It is mentioned that this possible here: https://aws.amazon.com/blogs/aws/amazon-kinesis-analytics-process-streaming-data-in-real-time-with-sql/ in the discussion of custom windows but does not give an example.

这通常是在 SQL 中使用 LAG 函数完成的,因此您可以计算连续行之间的时间差.这篇文章:https://blog.modeanalytics.com/finding-user-sessions-sql/ 描述了如何使用传统的(非流)SQL 来做到这一点.但是,我在 Kinesis Analytics 中没有看到对 LAG 函数的支持.

Typically this is done in SQL using the LAG function so you can compute the time difference between consecutive rows. This post: https://blog.modeanalytics.com/finding-user-sessions-sql/ describes how to do it with conventional (non-streaming) SQL. However, I don't see support for the LAG function in Kinesis Analytics.

我特别喜欢两个例子.假设两者都将包含 user_id 和时间戳的流作为输入.将来自同一用户的事件序列定义为一个会话,间隔小于 5 分钟

In particular I would love two examples. Assume that both take as input a stream consisting of a user_id and a timestamp. Define a session a sequence of events from the same user separated by less than 5 minutes

1) 第一个输出具有附加列 event_count session_start_timestamp 的流.每次有事件进来时,都应该输出一个带有这两个附加列的事件.

1) The first outputs a stream that has the additional columns event_count session_start_timestamp. Every time an event comes in this should output an event with these two additional columns.

2) 第二个例子是一个流,一旦会话结束(即 5 分钟过去了,没有来自用户的数据),它会为每个会话输出一个事件.此事件将具有 userId、start_timestamp、end_timestamp 和 event_count

2) The second example would be a stream that outputs a single event per session once the session has ended (ie 5 minutes have past with no data from a user). This event would have userId, start_timestamp, end_timestamp, and event_count

这可以通过 Kinesis Analytics 实现吗?

Is this possible with Kinesis Analytics?

以下是使用 Apache Spark 执行此操作的示例:https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/Applications/01%20Sessionization.html

Here is an example of doing this with Apache Spark: https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/Applications/01%20Sessionization.html

但我很想用一(或两个)Kinesis Analytics 流来做到这一点.

But I would love to do this with one (or two) Kinesis Analytics streams.

推荐答案

Kinesis Analytics 现在支持 LAG.您可以在文档页面上看到它 http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-lag.html.我实际上已经将它用于与您描述的类似的用例.

There is support for LAG now on Kinesis Analytics. You can see it on the documentation page http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-lag.html. I have actually used it for a similar use case as the one you describe.

这篇关于使用 Kinesis Analytics 构建实时会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆