如何有效计算Google BigQuery中数字序列的中位数? [英] How to calculate median of a numeric sequence in Google BigQuery efficiently?

查看:63
本文介绍了如何有效计算Google BigQuery中数字序列的中位数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要有效地计算Google BigQuery中数字序列的中值.可能一样吗?

I need to calculate median value of a numeric sequence in Google BigQuery efficiently. Is the same possible?

推荐答案

是的,可以使用 PERCENTILE_CONT 窗口功能.

Yeah it's possible with PERCENTILE_CONT window function.

返回基于线性插值之间的值 按ORDER BY子句对它们进行排序后,该组的值.

Returns values that are based upon linear interpolation between the values of the group, after ordering them per the ORDER BY clause.

必须在0到1之间.

此窗口函数需要OVER子句中的ORDER BY.

This window function requires ORDER BY in the OVER clause.

因此,示例查询将类似于(max()只是在整个组中起作用,但它不用作数学逻辑,不应使您感到困惑)

So an example query would be like (the max() is there just to work across the group by but it's not being used as a math logic, should not confuse you)

SELECT room,
      max(median) FROM   (SELECT room,
         percentile_cont(0.5) OVER (PARTITION BY room
                                    ORDER BY temperature) AS median    FROM
    (SELECT 1 AS room,
            11 AS temperature),
    (SELECT 1 AS room,
            12 AS temperature),
    (SELECT 1 AS room,
            14 AS temperature),
    (SELECT 1 AS room,
            19 AS temperature),
    (SELECT 1 AS room,
            13 AS temperature),
    (SELECT 2 AS room,
            20 AS temperature),
    (SELECT 2 AS room,
            21 AS temperature),
    (SELECT 2 AS room,
            29 AS temperature),
    (SELECT 3 AS room,
            30 AS temperature)) GROUP BY room

这将返回:

+------+-------------+
| room | temperature |
+------+-------------+
|    1 |          13 |
|    2 |          21 |
|    3 |          30 |
+------+-------------+

这篇关于如何有效计算Google BigQuery中数字序列的中位数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆