使用SQL遍历行 [英] Iterate over rows using SQL

查看:89
本文介绍了使用SQL遍历行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Redshift数据库中有一个包含事件数据的表.每行是一个事件.每个事件都有eventid,但现在没有我需要的sessionid.我提取了表的样本(列的子集,只有一个用户ID的事件):

I have a table in a Redshift-database containing event-data. Each row is one event. Every event have eventid, but not sessionid that I now need. I have extracted a sample of the table (a subset of columns and only events from one userid):

time        userid          eventid     sessionstart    sessiontop
1498639773  101xnmnd1ohi62  504747459   t               f
1498639777  101xnmnd1ohi62  1479311450  f               f
1498639803  101xnmnd1ohi62  808610184   f               f
1498639816  101xnmnd1ohi62  335000637   f               f
1498639903  101xnmnd1ohi62  238269920   f               f
1498639906  101xnmnd1ohi62  990687838   f               f
1498639952  101xnmnd1ohi62  781472797   f               t
1498650109  101xnmnd1ohi62  1826568537  t               f
1498650124  101xnmnd1ohi62  2079795673  f               f
1498650365  101xnmnd1ohi62  578922176   f               t

这是按照用户ID和时间排序的,以便根据会话活动以正确的顺序显示事件.每个事件的sessionstart和sessionstop都有一个布尔值.通过查看事件列表,我可以通过找到sessionstart = true和sessionstop = true(包括)内的所有事件来标识会话.在此处列出的事件中,有两个会话.第一个会话以eventid 504747459开始,并以781472797结束.第二个会话以eventid 1826568537开始,并以578922176结束.我要执行的操作是使用SQL使用sessionid标记这两个会话(以及所有其他会话).我还没有找到使用SQL执行此操作的任何方法.可以使用例如.Python,但我相信性能会非常差.因此,首选SQL.

This is ordered by userid and time, so that the events are displayed in correct order, according to session activity. Every event has a boolean value for sessionstart and sessionstop. By looking at the list of events I can identify the sessions by finding all events within (and including) sessionstart=true and sessionstop=true. In the events listed here, there are two sessions. First session starts with eventid 504747459 and ends with 781472797. Second session starts with eventid 1826568537 and ends with 578922176. What I want to do is mark these two sessions (and all other sessions) with a sessionid, using SQL. I haven't found any way to do this using SQL. It will be possible using eg. Python, but I believe the performance will be very poor. Therefore SQL is preferred.

有没有人提示我如何解决这个问题?

Does anyone have a tip to how I can solve this?

推荐答案

我认为仅使用 sessionstart 可能会更容易-假设会话开始和会话之间没有事件结束.

I think it might be easier just to use sessionstart -- assuming that there are no events in-between as session start and session end.

如果是这样:

select e.*
       sum(case when sessionstart then 1 else 0 end) over (partition by userid order by time) as user_sessionid
from events e;

这将在每个用户内提供一个会话ID.如果用户总是以新的会话开始(合理的假设),则可以轻松地将其扩展为全局会话ID:

This provides a sessionid "within" each user. If users always start with a new session (a reasonable assumption), then this is easily extended to a global session id:

select e.*
       sum(case when sessionstart then 1 else 0 end) over (order by userid, time) as user_sessionid
from events e;

这篇关于使用SQL遍历行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆