BigQuery SQL运行总计 [英] BigQuery SQL running totals

查看:116
本文介绍了BigQuery SQL运行总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何想法如何计算BigQuery SQL中的运行总数?

  id值运行总计
- ----- ----------- -
1 1 1
2 2 3
3 4 7
4 7 14
5 9 23
6 12 35
7 13 48
8 16 64
9 22 86
10 42 128
11 57 185
12 58 243
13 59 302
14 60 362

使用相关标量查询的传统SQL服务器不是问题:

  SELECT a.id,a.value,(SELECT SUM(b.value)
FROM RunTotalTestData b
WHERE b.id< = a.id)
FROM RunTotalTestData a
ORDER BY a.id;

或加入:

  SELECT a.id,a.value,SUM(b.Value)
FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id< = a.id
GROUP BY a.id,a.value
ORDER BY a.id;

但我找不到一种方法让它在BigQuery中工作...

解决方案

您可能已经知道了。但这里是一个,而不是最有效的方式:



JOIN只能使用相等比较完成,iebid <= a.id不能使用。



https://developers.google .com / bigquery / docs / query-reference#joins



如果你问我,这很蹩脚。但是有一项工作。只需对某个虚拟值使用相等比较来获取笛卡尔积,然后使用WHERE来计算< =。这是疯狂的不理想。但是,如果你的表很小,这是行得通的。

  SELECT a.id,SUM(a.value)为rt 
FROM RunTotalTestData a
JOIN RunTotalTestData b ON a.dummy = b.dummy
WHERE b.id< = a.id
GROUP BY a.id
ORDER BY rt

您也可以手动限制时间:

  SELECT a.id,SUM(a.value)作为rt 
FROM(
SELECT id,timestamp RunTotalTestData
WHERE timestamp> = foo
AND timestamp< bar
)as a
JOIN(
SELECT id,timestamp,value RunTotalTestData
WHERE timestamp> = foo AND timestamp< $ bar
)b ON a.dummy = b.dummy
WHERE b.id< = a.id
GROUP BY a.id
ORDER BY rt

更新:



您不需要特殊属性。您可以使用

 选择1作为一个

并加入。



随着计费开始,连接表在处理中计数。

Any idea how to calculate running total in BigQuery SQL?

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362 

Not a problem for traditional SQL servers using either correlated scalar query:

SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

or join:

SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

But I couldn't find a way to make it work in BigQuery...

解决方案

You probably figured it out already. But here is one, not the most efficient, way:

JOIN can only be done using equality comparisons i.e. b.id <= a.id cannot be used.

https://developers.google.com/bigquery/docs/query-reference#joins

This is pretty lame if you ask me. But there is one work around. Just use equality comparison on some dummy value to get the cartesian product and then use WHERE for <=. This is crazily suboptimal. But if your tables are small this is going to work.

SELECT a.id, SUM(a.value) as rt 
FROM RunTotalTestData a 
JOIN RunTotalTestData b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

You can manually constrain the time as well:

SELECT a.id, SUM(a.value) as rt 
FROM (
    SELECT id, timestamp RunTotalTestData 
    WHERE timestamp >= foo 
    AND timestamp < bar
) AS a 
JOIN (
    SELECT id, timestamp, value RunTotalTestData 
    WHERE timestamp >= foo AND timestamp < bar
) b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

Update:

You don't need a special property. You can just use

SELECT 1 AS one

and join on that.

As billing goes the join table counts in the processing.

这篇关于BigQuery SQL运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆