SQL:使用LEAD()和PARTITION BY访问当前行之后的下一行 [英] SQL: Use LEAD() and PARTITION BY to access to the next row following the current row

查看:118
本文介绍了SQL:使用LEAD()和PARTITION BY访问当前行之后的下一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个移动应用浏览历史记录数据集,如下所示.

I have a mobile app browsing history dataset as shown below.

  • DeviceDateTime:用户在移动应用中查看页面的日期和时间.
  • 用户ID:每个用户ID代表登录移动应用程序的访问者.
  • PageName:移动应用中有不同的页面.所有访客首先会进入首页,然后导航至其他页面.
  • PageSequence:已访问页面的顺序.例如,Seq_1主页>Seq_2我的帐户=首先登录首页,然后导航至我的帐户"页面.
<身体>
DeviceDateTime 用户ID PageName PageSequence
2021-01-19 16:40:00.000 UserA 首页 Seq_1
2021-01-19 16:40:00.000 UserA 我的帐户 Seq_2
2021-01-19 16:40:07.000 UserA 我的活动 Seq_3
2021-01-19 16:40:07.000 UserA 餐厅信息 Seq_4
2021-01-19 16:40:18.000 UserA 餐厅详细信息页面 Seq_5
2021-01-19 16:40:31.000 UserA 餐厅详细信息页面 Seq_6
2021-01-19 16:40:31.000 UserA 餐厅预订确认 Seq_7
2021-01-19 16:40:40.000 UserA 首页 Seq_8
2021-01-19 16:40:45.000 UserA 写信给我们 Seq_9
2021-01-19 16:40:46.0​​00 UserA 首页 Seq_10
2021-01-28 21:11:53.000 UserB 首页 Seq_1
2021-01-28 21:12:01.000 UserB 餐厅信息 Seq_2
2021-01-28 21:13:37.000 UserB 餐厅信息 Seq_3
2021-02-16 09:43:27.000 UserA 首页 Seq_1
2021-02-16 09:43:43.000 UserA 写信给我们 Seq_2
2021-02-16 09:44:50.000 UserA 我的帐户 Seq_3
2021-02-16 09:45:03.000 UserA 我的活动 Seq_4

我需要在SQL中进行以下预处理:

I need to do the following pre-processing in SQL:

  1. 将数据集汇总到一个表中,如下所示.我想在表格中显示"FROM"(源页面)和"TO"(目标页面).例如,有1位访客已经经历了来自家"的旅程.页面转到我的帐户"页;2位访客已经经历了来自家"的旅程页面以写信给我们".
  2. 当旅程是从"和到"同一页面时,请勿计算旅程.例如,用户A Seq_5和Seq_6,"FROM餐厅详细信息页面"转到餐厅详细信息页面";不应该包含在内.
  3. 不应包括两个不同用户之间的旅程.例如,从餐厅列表"(用户B Seq_3)转到主页"(UserA Seq_1)不应该包含在内.

结果表:

<身体>
FROM TO No_of_Users
首页我的帐户 1
我的帐户我的活动 2
我的活动餐厅信息 1
餐厅信息餐厅详细信息页面 1
餐厅详细信息页面餐厅预订确认 1
餐厅预订确认首页 1
首页写信给我们 2
写信给我们首页 1
首页餐厅信息 1
首页餐厅信息 1
写信给我们我的帐户 1

我大约有60万用户,数据集中共有21个唯一的PageName.

I have around 600,000 users and a total of 21 unique PageName in the dataset.

我尝试了以下脚本,但是没有用.我未能在摘要表中记录所有可能的旅程.例如,FROM写给我们"从餐厅预订确认"到我的帐户"前往家"结果中都丢失了.

I have tried the following script but it didn't work. I failed to capture all the possible journeys in the summary table. For example, FROM "Write To Us" TO "My Account", FROM "Restaurant Booking Confirmation" TO "Home" are all missing in the result.

DROP TABLE IF EXISTS #App
CREATE TABLE #App (
    DeviceDateTime SMALLDATETIME,
    UserID VARCHAR(100),
    PageName VARCHAR(100),
    PageSequence VARCHAR(100))
INSERT INTO #App VALUES
    ('2021-01-19 16:40:00.000','UserA', 'Home', 'Seq_1'),
    ('2021-01-19 16:40:00.000','UserA', 'My Account', 'Seq_2'),
    ('2021-01-19 16:40:07.000','UserA', 'My Activity', 'Seq_3'),
    ('2021-01-19 16:40:07.000','UserA', 'Restaurant Listing', 'Seq_4'),
    ('2021-01-19 16:40:18.000','UserA', 'Restaurant Details Page', 'Seq_5'),
    ('2021-01-19 16:40:31.000','UserA', 'Restaurant Details Page', 'Seq_6'),
    ('2021-01-19 16:40:31.000','UserA', 'Restaurant Booking Confirmation', 'Seq_7'),
    ('2021-01-19 16:40:40.000','UserA', 'Home', 'Seq_8'),
    ('2021-01-19 16:40:45.000','UserA', 'Write To Use', 'Seq_9'),
    ('2021-01-19 16:40:46.000','UserA', 'Home', 'Seq_10'),
    ('2021-01-28 21:11:53.000','UserB', 'Home', 'Seq_1'),
    ('2021-01-28 21:12:01.000','UserB', 'Restaurant Listing', 'Seq_2'),
    ('2021-01-28 21:13:37.000','UserB', 'Restaurant Listing', 'Seq_3'),
    ('2021-02-16 09:43:27.000','UserA', 'Home', 'Seq_1'),
    ('2021-02-16 09:43:43.000','UserA', 'Write To Us', 'Seq_2'),
    ('2021-02-16 09:44:50.000','UserA', 'My Account', 'Seq_3'),
    ('2021-02-16 09:45:03.000','UserA', 'My Activity', 'Seq_4');

DROP TABLE IF EXISTS #SD
with seq_fixed as
(
  select
    UserID,
    DeviceDateTime,
    PageName,
    cast(right(PageSequence, charindex('_', reverse(PageSequence)) - 1) as int) as pagesequencefinal
  from #App
)
, with_next as
(
  select
    UserID,
    DeviceDateTime,
    PageName,
    lead(PageName) over (partition by UserID, DeviceDateTime order by UserID, DeviceDateTime ASC) as next_pagename
  from seq_fixed
  group by UserID, DeviceDateTime, PageName
)
select PageName, next_pagename, count(*) AS No_of_User
into #SD
from with_next
where next_pagename is not null
group by PageName, next_pagename
order by PageName, next_pagename;

SELECT * FROM #SD

推荐答案

在您之前的请求中,您只想确认每位用户首次出现该页面,因此Home-> PageX-> Home-> PageY将被解释为Home-> PageX-> PageY.为此,您必须按用户和页面分组才能找到第一次出现的情况.

In your previous request you wanted to only acknowledge a page's first ocurrence per user, so Home->PageX->Home->PageY would be interpreted as Home->PageX->PageY. For this to happen you had to group by user and page to find this first occurrence.

在此新请求中不是这种情况,因此请不要汇总:

This is not the case in this new request, so don't aggregate:

with seq_fixed as
(
  select
    userid,
    pagename,
    cast(right(pagesequence, charindex('_', reverse(pagesequence)) - 1) as int) as pagesequencefinal
  from app
)
, with_next as
(
  select
    userid,
    pagename,
    lead(pagename) over (partition by userid order by pagesequencefinal) as next_pagename
  from seq_fixed
)
select pagename, next_pagename, count(*)
from with_next
where next_pagename is not null
group by pagename, next_pagename
order by pagename, next_pagename;

唯一的问题是:当用户昨天在PageX上结束并且今天从Home开始时,这将计为PageX-> Home.如果您想防止这种情况的发生,则需要针对这种情况进行一些检测,例如当其前任已存在至少1小时或类似时间时,请勿将条目视为页面更改.为此,您可以使用时间戳列和 LAG .

The only problem with this: When a user ended on PageX yesterday and starts with Home today, this will count as PageX->Home. If you want to prevent this from happening you need some detection for this situation, e.g. don't consider an entry a page change when it's pedecessor is at least 1 hour old or the like. For this you can use your timestamp column and LAG.

这篇关于SQL:使用LEAD()和PARTITION BY访问当前行之后的下一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆