用于滑动窗口聚合的 Bigquery SQL [英] Bigquery SQL for sliding window aggregate

查看:15
本文介绍了用于滑动窗口聚合的 Bigquery SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张看起来像这样的桌子

Hi I have a table that looks like this

Date         Customer   Pageviews
2014/03/01   abc          5
2014/03/02   xyz          8
2014/03/03   abc          6

我想获得按周分组的页面视图聚合,但显示过去 30 天的聚合 -(滑动窗口聚合,窗口大小为每周 30 天)

I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)

我正在使用谷歌 bigquery

I am using google bigquery

戈登 - 关于客户"的评论,实际上我需要的稍微复杂一些,这就是我在上表中包含客户的原因.我希望获得每周 30 天窗口中浏览量 >n 的客户数量.像这样

Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this

Date        Customers>10 pageviews in 30day window
2014/02/01  10
2014/02/08  5
2014/02/15  6
2014/02/22  15

然而,为了简单起见,如果我能得到一个滑动窗口聚合浏览量而完全忽略客户,我会按照我的方式工作.像这样

However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this

Date        count of pageviews in 30day window
2014/02/01  50
2014/02/08  55
2014/02/15  65
2014/02/22  75

推荐答案

这个怎么样:

SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
  SELECT changes,
         LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
         LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
         LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
         login,
         week
  FROM (
    SELECT SUM(payload_pull_request_changed_files) changes, 
           UTC_USEC_TO_WEEK(created_at, 1) week,
           actor_attributes_login login,
    FROM [publicdata:samples.github_timeline]
    WHERE payload_pull_request_changed_files > 0
    GROUP BY week, login
))
HAVING changes28days > 0

对于每个用户,它会计算他们每周提交的更改数量.然后使用 LAG() 我们可以查看下一行,他们在 -1、-2 和 -3 周提交了多少更改.然后我们将这 4 周相加,看看在过去 28 天内提交了多少更改.

For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.

现在您可以将所有内容都包装在一个新查询中,以过滤具有更改>X 的用户,并对其进行计数.

Now you can wrap everything in a new query to filter users with changes>X, and count them.

这篇关于用于滑动窗口聚合的 Bigquery SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆