用于滑动窗口聚合的BigQuery SQL [英] Bigquery SQL for sliding window aggregate

查看:140
本文介绍了用于滑动窗口聚合的BigQuery SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我有一张表格,看起来像这样

 日期客户浏览量
2014 / 03/01 abc 5
2014/03/02 xyz 8
2014/03/03 abc 6

我希望获得按周分组的页面视图聚合,但显示过去30天的聚合 - (滑动窗口聚集,每周的窗口大小为30天)

我正在使用google bigquery



编辑:Gordon - 关于客户的评论,其实我需要的是稍微复杂一点,在上面的表格中。我期望获得每周在30天窗口中拥有> n次综合浏览量的客户数量。类似这样的消息

 日期客户> 30天窗口中的10次综合浏览量
2014/02/01 10
2014 / 02/08 5
2014/02/15 6
2014/02/22 15

然而为了简单起见,如果我只能得到一个浏览量的滑动窗口集合,而忽略了所有客户,我会按我的方式工作。像这样的事情

  30天窗口中浏览量的日期计数
2014/02/01 50
2014 / 02/08 55
2014/02/15 65
2014/02/22 75




  SELECT changes + changes1 + changes2 + changes3 changes28days 

,USEC_TO_TIMESTAMP(星期)
FROM(
SELECT变化,
LAG(变化,1)OVER(PARTITION BY登录ORDER BY星期)变化1,
LAG(变化,2 )OVER(PARTITION BY登录ORDER BY星期)changes2,
LAG(变化,3)OVER(PARTITION BY登录ORDER BY星期)changes3,
登录,
星期
FROM(
SELECT SUM(payload_pull_request_changed_files)更改,
UTC_USEC_TO_WEEK(created_at,1)week,
actor_attributes_login登录,$ b $ FROM [publicdata:samples.github_timeline]
WHERE payload_pull_request_changed_files> 0
GROUP BY周,登录
))
HAVING changes28days> 0

对于每个用户,它都会计算他们每周提交的更改数量。然后用LAG(),我们可以看到下一行,他们提交了-1,-2和-3周的更改。然后,我们只需添加这4周,即可查看过去28天内提交的更改数量。



现在,您可以将所有内容都包含在新查询中, X,并对它们进行计数。


Hi I have a table that looks like this

Date         Customer   Pageviews
2014/03/01   abc          5
2014/03/02   xyz          8
2014/03/03   abc          6

I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)

I am using google bigquery

EDIT: Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this

Date        Customers>10 pageviews in 30day window
2014/02/01  10
2014/02/08  5
2014/02/15  6
2014/02/22  15

However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this

Date        count of pageviews in 30day window
2014/02/01  50
2014/02/08  55
2014/02/15  65
2014/02/22  75

解决方案

How about this:

SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
  SELECT changes,
         LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
         LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
         LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
         login,
         week
  FROM (
    SELECT SUM(payload_pull_request_changed_files) changes, 
           UTC_USEC_TO_WEEK(created_at, 1) week,
           actor_attributes_login login,
    FROM [publicdata:samples.github_timeline]
    WHERE payload_pull_request_changed_files > 0
    GROUP BY week, login
))
HAVING changes28days > 0

For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.

Now you can wrap everything in a new query to filter users with changes>X, and count them.

这篇关于用于滑动窗口聚合的BigQuery SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆