用于滑动窗口聚合的 Bigquery SQL [英] Bigquery SQL for sliding window aggregate
问题描述
我有一张看起来像这样的桌子
Hi I have a table that looks like this
Date Customer Pageviews
2014/03/01 abc 5
2014/03/02 xyz 8
2014/03/03 abc 6
我想获得按周分组的页面视图聚合,但显示过去 30 天的聚合 -(滑动窗口聚合,窗口大小为每周 30 天)
I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)
我正在使用谷歌 bigquery
I am using google bigquery
戈登 - 关于客户"的评论,实际上我需要的稍微复杂一些,这就是我在上表中包含客户的原因.我希望获得每周 30 天窗口中浏览量 >n 的客户数量.像这样
Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this
Date Customers>10 pageviews in 30day window
2014/02/01 10
2014/02/08 5
2014/02/15 6
2014/02/22 15
然而,为了简单起见,如果我能得到一个滑动窗口聚合浏览量而完全忽略客户,我会按照我的方式工作.像这样
However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this
Date count of pageviews in 30day window
2014/02/01 50
2014/02/08 55
2014/02/15 65
2014/02/22 75
推荐答案
这个怎么样:
SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
SELECT changes,
LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
login,
week
FROM (
SELECT SUM(payload_pull_request_changed_files) changes,
UTC_USEC_TO_WEEK(created_at, 1) week,
actor_attributes_login login,
FROM [publicdata:samples.github_timeline]
WHERE payload_pull_request_changed_files > 0
GROUP BY week, login
))
HAVING changes28days > 0
对于每个用户,它会计算他们每周提交的更改数量.然后使用 LAG() 我们可以查看下一行,他们在 -1、-2 和 -3 周提交了多少更改.然后我们将这 4 周相加,看看在过去 28 天内提交了多少更改.
For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.
现在您可以将所有内容都包装在一个新查询中,以过滤具有更改>X 的用户,并对其进行计数.
Now you can wrap everything in a new query to filter users with changes>X, and count them.
这篇关于用于滑动窗口聚合的 Bigquery SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!