尝试使用Redshift SQL计算累积的不同实体 [英] Trying to count cumulative distinct entities using Redshift SQL

查看:66
本文介绍了尝试使用Redshift SQL计算累积的不同实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在一个时间序列中获得Redshift中不同对象的累积计数.最直接的方法是使用COUNT(DISTINCT myfield)OVER(按时间字段排序DESC ROWS UNBOUNDED PRECEDING),但是Redshift给出了不支持窗口定义"错误.

I'm trying to get a cumulative count of distinct objects in Redshift over a time series. The straightforward thing would be to use COUNT(DISTINCT myfield) OVER (ORDER BY timefield DESC ROWS UNBOUNDED PRECEDING), but Redshift gives a "Window definition is not supported" error.

例如,下面的代码试图查找从第一周到现在的每周累积的独立用户.但是,出现不支持窗口功能"错误.

For example, the code below is trying to find the cumulative distinct users for every week from the first week to the present. However, I get the "Window function not supported" error.

SELECT user_time.weeks_ago, 
       COUNT(distinct user_time.user_id) OVER
            (ORDER BY weeks_ago desc ROWS UNBOUNDED PRECEDING) as count
FROM   (SELECT FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7) AS weeks_ago,
               ev.user_id as user_id
        FROM events as ev
        WHERE ev.action='some_user_action') as user_time

目标是建立执行某项操作的唯一身份用户的累积时间序列.有关如何执行此操作的任何想法?

The goal is to build a cumulative time series of unique users who have performed an action. Any ideas on how to do this?

推荐答案

找出答案.窍门原来是一组嵌套的子查询,内部的子查询将计算每个用户的第一个操作的时间.中间子查询计算每个时间段的总动作,最后一个外部查询执行时间序列上的累积总和:

Figured out the answer. The trick turned out to be a set of nested subqueries, the inner one calculates the time of each user's first action. The middle subquery counts the total actions per time period, and the final outer query performs the cumulative sums over the time series:

(SELECT engaged_per_week.week as week,
       SUM(engaged_per_week.total) over (order by engaged_per_week.week DESC ROWS UNBOUNDED PRECEDING) as total
 FROM 
    -- COUNT OF FIRST TIME ENGAGEMENTS PER WEEK
    (SELECT engaged.first_week AS week,
            count(engaged.first_week) AS total
    FROM
       -- WEEK OF FIRST ENGAGEMENT FOR EACH USER
       (SELECT  MAX(FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7)) as first_week
        FROM     events ev
        WHERE    ev.name='some_user_action'
        GROUP BY ev.user_id) AS engaged

    GROUP BY week) as engaged_per_week
ORDER BY week DESC) as cumulative_engaged

这篇关于尝试使用Redshift SQL计算累积的不同实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆