用于28天滑动窗口聚合的BigQuery SQL(无需编写28行SQL) [英] BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

查看:211
本文介绍了用于28天滑动窗口聚合的BigQuery SQL(无需编写28行SQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



这个问题的最佳答案



滑动窗口聚合的Bigquery SQL

表明您可以使用LAG功能。一个例子是:

  SELECT 
花费+花费_lagged_1day +花费_lagged_2day +花费_lagged_3day + ... +花费_花旗_27天为花费_28_day_sum,
用户,
日期
FROM(
选择花费,
LAG(花费,1)OVER(PARTITION BY用户ORDER BY日期)花费_lagged_1day,
LAG(花费,2)OVER(PARTITION BY用户ORDER BY日期)spend_lagged_2day,
LAG(花费,3)OVER(PARTITION BY用户ORDER BY日期)花费_lagged_3day,
...
LAG(花费,28)OVER(PARTITION BY用户ORDER BY日期)花费_lagged_day,
用户,
日期
FROM user_spend

有没有办法做到这一点,而不必写出28行SQL!

解决方案

BigQuery文档不能很好地解释该工具支持的窗口函数的复杂性,因为它没有指定什么表达式sions可以出现在ROWS或RANGE之后。它实际上支持窗口函数的SQL 2003标准,您可以在网上找到其他地方的文档,例如这里



这意味着你可以通过一个窗口函数获得你想要的效果。范围是27,因为它是在当前行数之前包括在总和中的行数。

  SELECT花费,
SUM(花费)OVER(PARTITION BY用户ORDER BY日期ROWS之间的先行和当前行),
用户,
日期
FROM user_spend;

范围界限也是非常有用的。如果您的表缺少某个用户的日期,那么27个PRECEDING行将返回超过27天,但RANGE将根据日期值本身生成一个窗口。在以下查询中,日期字段是BigQuery TIMESTAMP,范围以微秒为单位指定。我建议,只要你在BigQuery中做了这样的数学计算,就可以对它进行全面测试,以确保它能给你预期的答案。

  SELECT支出,
SUM(支出)OVER(PARTITION BY用户ORDER BY日期RANGE BETWEEN 27 * 24 * 60 * 60 * 1000000 PRECEDING AND CURRENT ROW),
用户,
日期
FROM user_spend;


I'm trying to compute a 28 day moving sum in BigQuery using the LAG function.

The top answer to this question

Bigquery SQL for sliding window aggregate

from Felipe Hoffa indicates that that you can use the LAG function. An example of this would be:

SELECT
    spend + spend_lagged_1day + spend_lagged_2day + spend_lagged_3day + ... +  spend_lagged_27day as spend_28_day_sum,
    user,
    date
FROM (
  SELECT spend,
         LAG(spend, 1) OVER (PARTITION BY user ORDER BY date) spend_lagged_1day,
         LAG(spend, 2) OVER (PARTITION BY user ORDER BY date) spend_lagged_2day,
         LAG(spend, 3) OVER (PARTITION BY user ORDER BY date) spend_lagged_3day,
         ...
         LAG(spend, 28) OVER (PARTITION BY user ORDER BY date) spend_lagged_day,
         user,
         date
  FROM user_spend
)

Is there a way to do this without having to write out 28 lines of SQL!

解决方案

The BigQuery documentation doesn't do a good job of explaining the complexity of window functions that the tool supports because it doesn't specify what expressions can appear after ROWS or RANGE. It actually supports the SQL 2003 standard for window functions, which you can find documented other places on the web, such as here.

That means you can get the effect you want with a single window function. The range is 27 because it's how many rows before the current one to include in the sum.

SELECT spend,
       SUM(spend) OVER (PARTITION BY user ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),
       user,
       date
FROM user_spend;

A RANGE bound can also be extremely useful. If your table was missing dates for some user, then 27 PRECEDING rows would go back more than 27 days, but RANGE will produce a window based on the date values themselves. In the following query, the date field is a BigQuery TIMESTAMP and the range is specified in microseconds. I'd advise that whenever you do date math like this in BigQuery, you test it thoroughly to make sure it's giving you the expected answer.

SELECT spend,
       SUM(spend) OVER (PARTITION BY user ORDER BY date RANGE BETWEEN 27 * 24 * 60 * 60 * 1000000 PRECEDING AND CURRENT ROW),
       user,
       date
FROM user_spend;

这篇关于用于28天滑动窗口聚合的BigQuery SQL(无需编写28行SQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆