BigQuery SQL运行总计 [英] BigQuery SQL running totals
问题描述
任何想法如何计算BigQuery SQL中的运行总数?
id值运行总计
- ----- ----------- -
1 1 1
2 2 3
3 4 7
4 7 14
5 9 23
6 12 35
7 13 48
8 16 64
9 22 86
10 42 128
11 57 185
12 58 243
13 59 302
14 60 362
使用相关标量查询的传统SQL服务器不是问题:
SELECT a.id,a.value,(SELECT SUM(b.value)
FROM RunTotalTestData b
WHERE b.id< = a.id)
FROM RunTotalTestData a
ORDER BY a.id;
或加入:
SELECT a.id,a.value,SUM(b.Value)
FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id< = a.id
GROUP BY a.id,a.value
ORDER BY a.id;
但我找不到一种方法让它在BigQuery中工作...
您可能已经知道了。但这里是一个,而不是最有效的方式:
JOIN只能使用相等比较完成,iebid <= a.id不能使用。
https://developers.google .com / bigquery / docs / query-reference#joins
如果你问我,这很蹩脚。但是有一项工作。只需对某个虚拟值使用相等比较来获取笛卡尔积,然后使用WHERE来计算< =。这是疯狂的不理想。但是,如果你的表很小,这是行得通的。
SELECT a.id,SUM(a.value)为rt
FROM RunTotalTestData a
JOIN RunTotalTestData b ON a.dummy = b.dummy
WHERE b.id< = a.id
GROUP BY a.id
ORDER BY rt
您也可以手动限制时间:
SELECT a.id,SUM(a.value)作为rt
FROM(
SELECT id,timestamp RunTotalTestData
WHERE timestamp> = foo
AND timestamp< bar
)as a
JOIN(
SELECT id,timestamp,value RunTotalTestData
WHERE timestamp> = foo AND timestamp< $ bar
)b ON a.dummy = b.dummy
WHERE b.id< = a.id
GROUP BY a.id
ORDER BY rt
更新:
您不需要特殊属性。您可以使用
选择1作为一个
并加入。
随着计费开始,连接表在处理中计数。
Any idea how to calculate running total in BigQuery SQL?
id value running total
-- ----- -------------
1 1 1
2 2 3
3 4 7
4 7 14
5 9 23
6 12 35
7 13 48
8 16 64
9 22 86
10 42 128
11 57 185
12 58 243
13 59 302
14 60 362
Not a problem for traditional SQL servers using either correlated scalar query:
SELECT a.id, a.value, (SELECT SUM(b.value)
FROM RunTotalTestData b
WHERE b.id <= a.id)
FROM RunTotalTestData a
ORDER BY a.id;
or join:
SELECT a.id, a.value, SUM(b.Value)
FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;
But I couldn't find a way to make it work in BigQuery...
You probably figured it out already. But here is one, not the most efficient, way:
JOIN can only be done using equality comparisons i.e. b.id <= a.id cannot be used.
https://developers.google.com/bigquery/docs/query-reference#joins
This is pretty lame if you ask me. But there is one work around. Just use equality comparison on some dummy value to get the cartesian product and then use WHERE for <=. This is crazily suboptimal. But if your tables are small this is going to work.
SELECT a.id, SUM(a.value) as rt
FROM RunTotalTestData a
JOIN RunTotalTestData b ON a.dummy = b.dummy
WHERE b.id <= a.id
GROUP BY a.id
ORDER BY rt
You can manually constrain the time as well:
SELECT a.id, SUM(a.value) as rt
FROM (
SELECT id, timestamp RunTotalTestData
WHERE timestamp >= foo
AND timestamp < bar
) AS a
JOIN (
SELECT id, timestamp, value RunTotalTestData
WHERE timestamp >= foo AND timestamp < bar
) b ON a.dummy = b.dummy
WHERE b.id <= a.id
GROUP BY a.id
ORDER BY rt
Update:
You don't need a special property. You can just use
SELECT 1 AS one
and join on that.
As billing goes the join table counts in the processing.
这篇关于BigQuery SQL运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!