BigQuery SQL 运行总数 [英] BigQuery SQL running totals

查看:15
本文介绍了BigQuery SQL 运行总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

知道如何在 BigQuery SQL 中计算运行总数吗?

Any idea how to calculate running total in BigQuery SQL?

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362 

对于使用相关标量查询的传统 SQL 服务器来说都不是问题:

Not a problem for traditional SQL servers using either correlated scalar query:

SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

或加入:

SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

但我找不到让它在 BigQuery 中工作的方法...

But I couldn't find a way to make it work in BigQuery...

推荐答案

您可能已经想到了.但这是一种,不是最有效的方法:

You probably figured it out already. But here is one, not the most efficient, way:

JOIN 只能使用相等比较来完成,即不能使用 b.id <= a.id.

JOIN can only be done using equality comparisons i.e. b.id <= a.id cannot be used.

https://developers.google.com/bigquery/docs/query-参考#joins

如果你问我,这很蹩脚.但是有一种解决方法.只需对一些虚拟值使用相等比较来获得笛卡尔积,然后使用 WHERE for <=.这是疯狂的次优.但是如果你的桌子很小,这会起作用.

This is pretty lame if you ask me. But there is one work around. Just use equality comparison on some dummy value to get the cartesian product and then use WHERE for <=. This is crazily suboptimal. But if your tables are small this is going to work.

SELECT a.id, SUM(a.value) as rt 
FROM RunTotalTestData a 
JOIN RunTotalTestData b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

您也可以手动限制时间:

You can manually constrain the time as well:

SELECT a.id, SUM(a.value) as rt 
FROM (
    SELECT id, timestamp RunTotalTestData 
    WHERE timestamp >= foo 
    AND timestamp < bar
) AS a 
JOIN (
    SELECT id, timestamp, value RunTotalTestData 
    WHERE timestamp >= foo AND timestamp < bar
) b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

更新:

您不需要特殊属性.你可以使用

You don't need a special property. You can just use

SELECT 1 AS one

并加入其中.

随着计费的进行,连接表在处理中计数.

As billing goes the join table counts in the processing.

这篇关于BigQuery SQL 运行总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆