在《雪花》中使用SQL进行漏斗分析 [英] Funnel Analytics using SQL in Snowflake

查看:10
本文介绍了在《雪花》中使用SQL进行漏斗分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个查询,以通过事件通过平台跟踪用户的生命周期。表EVENTS有3列USER_IDDATE_TIMEEVENT_NAME。下面是该表的快照

下面是我的问题

SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
    MEASURES MIN(IFF(EVENT_NAME = 'registration new', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'registration pending confirm', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your business information', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your personal information', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'qualified', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
  ONE ROW PER MATCH
  PATTERN(STEP_1 ANYTHING* STEP_5)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS EVENT_NAME = 'registration pending confirm',
        STEP_3 AS EVENT_NAME = 'your business information',
        STEP_4 AS EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
)

我的预期结果

我现在得到的,

以下是我的要求/注意事项

  • 下一个事件的时间戳应大于或等于上一个事件的时间戳(以先到者为准,以便通过漏斗的事件的时间戳相等或保持递增)。此逻辑的一个很好的例子可以通过当前结果和预期结果的差异来解释,即与REGISTRATION_PENDING_CONFIRM_TIMESTAMPQUALIFIED_TIMESTAMP列中的值的差异。
  • 并非所有用户都有这5个事件,例如,如果USER_ID54321没有/跳过事件‘Your Personal Information’,则结果必须包含其余步骤的数据(现在,如果用户没有/跳过漏斗中的任何事件,则查询不会返回数据)。我感觉这是因为当用户流中缺少定义为度量值的事件时,模式搜索失败。

表中事件的顺序不一致,因此我已根据业务/漏斗逻辑在MEASURES部分中按顺序定义了事件

推荐答案

这不是一个完整的答案,但至少我在这里帮助定义了示例数据(比屏幕截图更好),并介绍了CLASSIFIER

的用法
create or replace temp table events as
select $1 user_id, $2 date_time, $3 event_name
from values(1,'2020-11-26 15:24:00','registration new')
, (1,'2021-04-12 18:00:00','registration new')
, (1,'2020-11-26 15:24:00','registration pending confirm')
, (1,'2021-04-12 18:11:00','registration pending confirm')
, (1,'2021-04-18 15:04:00','your personal information')
, (1,'2021-04-22 13:13:00','your personal information')
, (1,'2021-04-13 10:22:00','qualified')
, (1,'2021-04-22 13:13:00','qualified')
;


SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
 
    MEASURES  classifier as class, MIN(IFF(CLASSIFIER = 'STEP_1', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_2', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_3', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_4', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_5', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
 
  ONE ROW PER MATCH
 -- all rows per match
  PATTERN((step_1 | step_2 | step_3 | step_4 | step_5 | coincidence)*)--(STEP_2 | XX)* (STEP_3 | XXX)* (STEP_4 | XX)* (STEP_5 | XX)*)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'registration pending confirm' ,
        STEP_3 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your business information',
        STEP_4 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
        , COINCIDENCE AS LAG(DATE_TIME) = DATE_TIME
);

这篇关于在《雪花》中使用SQL进行漏斗分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆