如何有效计算Google BigQuery中数字序列的中位数? [英] How to calculate median of a numeric sequence in Google BigQuery efficiently?
问题描述
我需要有效地计算Google BigQuery中数字序列的中值.可能一样吗?
I need to calculate median value of a numeric sequence in Google BigQuery efficiently. Is the same possible?
推荐答案
是的,可以使用 PERCENTILE_CONT 窗口功能.
Yeah it's possible with PERCENTILE_CONT window function.
返回基于线性插值之间的值 按ORDER BY子句对它们进行排序后,该组的值.
Returns values that are based upon linear interpolation between the values of the group, after ordering them per the ORDER BY clause.
必须在0到1之间.
此窗口函数需要OVER子句中的ORDER BY.
This window function requires ORDER BY in the OVER clause.
因此,示例查询将类似于(max()只是在整个组中起作用,但它不用作数学逻辑,不应使您感到困惑)
So an example query would be like (the max() is there just to work across the group by but it's not being used as a math logic, should not confuse you)
SELECT room,
max(median) FROM (SELECT room,
percentile_cont(0.5) OVER (PARTITION BY room
ORDER BY temperature) AS median FROM
(SELECT 1 AS room,
11 AS temperature),
(SELECT 1 AS room,
12 AS temperature),
(SELECT 1 AS room,
14 AS temperature),
(SELECT 1 AS room,
19 AS temperature),
(SELECT 1 AS room,
13 AS temperature),
(SELECT 2 AS room,
20 AS temperature),
(SELECT 2 AS room,
21 AS temperature),
(SELECT 2 AS room,
29 AS temperature),
(SELECT 3 AS room,
30 AS temperature)) GROUP BY room
这将返回:
+------+-------------+
| room | temperature |
+------+-------------+
| 1 | 13 |
| 2 | 21 |
| 3 | 30 |
+------+-------------+
这篇关于如何有效计算Google BigQuery中数字序列的中位数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!