SQL Server 2008 R2 的百分位数聚合 [英] Percentile aggregate for SQL Server 2008 R2

查看:24
本文介绍了SQL Server 2008 R2 的百分位数聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 SQL Server 2008 R2.我需要计算每组的百分位值,例如:

I'm using SQL Server 2008 R2. I need to compute a percentile value per group, something like:

SELECT id,
       PCTL(0.9, x) -- for the 90th percentile
FROM my_table
GROUP BY id
ORDER BY id

例如,给定这个 DDL(fiddle)---

For example, given this DDL (fiddle) ---

CREATE TABLE my_table (id INT, x REAL);

INSERT INTO my_table
VALUES (7, 0.164595), (5, 0.671311), (7, 0.0118385), (6, 0.704592), (3, 0.633521), (3, 0.337268), (0, 0.54739), (6, 0.312282), (0, 0.220618), (7, 0.214973), (6, 0.410768), (7, 0.151572), (7, 0.0639506), (5, 0.339075), (1, 0.284094), (2, 0.126722), (2, 0.870079), (3, 0.369366), (1, 0.6687), (5, 0.199456), (5, 0.0296715), (1, 0.330339), (9, 0.0000459612), (5, 0.391947), (3, 0.753965), (8, 0.334207), (7, 0.583357), (3, 0.326951), (4, 0.207057), (2, 0.258463), (2, 0.0532811), (1, 0.751584), (7, 0.592624), (7, 0.673506), (5, 0.44764), (6, 0.733737), (5, 0.141215), (7, 0.222452), (3, 0.597019), (1, 0.293901), (4, 0.516213), (7, 0.498336), (6, 0.410461), (2, 0.32211), (1, 0.466735), (5, 0.720456), (8, 0.000428383), (3, 0.46085), (0, 0.402963), (7, 0.677002), (0, 0.400122), (1, 0.762357), (9, 0.158455), (7, 0.359723), (4, 0.225914), (7, 0.795345), (6, 0.902261), (2, 0.69533), (8, 0.593605), (6, 0.266233), (0, 0.917188), (9, 0.96353), (2, 0.577035), (8, 0.945236), (3, 0.257776), (4, 0.560569), (0, 0.838326), (2, 0.660338), (2, 0.537372), (8, 0.33806), (0, 0.545107), (1, 0.616673), (5, 0.30411), (0, 0.434737), (2, 0.588249), (9, 0.991362), (8, 0.772253), (6, 0.705396), (5, 0.323255), (8, 0.830319), (3, 0.679546), (4, 0.399748), (4, 0.440115), (6, 0.938154), (8, 0.333143), (9, 0.923541), (7, 0.19552), (4, 0.869822), (7, 0.620006), (4, 0.833529), (4, 0.297515), (4, 0.19906), (5, 0.540905), (9, 0.33313), (5, 0.200515), (5, 0.900481), (6, 0.02665), (3, 0.495421), (0, 0.96582), (9, 0.847218);

--- 我想要大约(在常见百分位数方法的变体中)以​​下内容:

--- I want approximately (within variation of common percentile methods) the following:

id  x
----------
0   0.9658
1   0.7624
2   0.6953
3   0.6795
4   0.8335
5   0.7205
6   0.9023
7   0.677
8   0.9452
9   0.9914

实际的输入集大约有 200 万行,每个实际的 id 组有几十到几百(或可能更多)行.

The actual input set has about two million rows, and each actual id group has a few dozen to a few hundred (or possibly more) rows.

我已经探索了 SO 和其他网站的解决方案,但似乎我检查的几十页左右的解决方案仅适用于计算整个行集的百分位数,而不是一行的每个组/分区放.(我对 SQL 比较缺乏经验,所以我可能忽略了一些东西.)

I've explored SO and other sites for solutions, but it seems like the couple dozen or so pages I checked have solutions that are only applicable to computing a percentile over an entire row set rather than each group/partition of a row set. (I'm relatively inexperienced with SQL, so I might have overlooked something.)

我还查看了 的文档排名函数,但我无法将有效的查询粘合在一起.

I've also looked at the docs for the ranking functions, but I haven't been able to glue together a query that would work.

我想使用 PERCENTILE_DISCPERCENTILE_CONT,但我我现在坚持使用 2008 R2.

I'd like to use PERCENTILE_DISC or PERCENTILE_CONT, but I'm stuck with 2008 R2 for now.

推荐答案

我喜欢直接做这些计算,使用 row_number()/rank() 和窗口函数.内置函数很有用,但实际上并没有节省那么多精力:

I like to do these calculations directly, using row_number()/rank() and window functions. The built-in functions are useful, but they don't actually save that much effort:

SELECT id,
       MIN(CASE WHEN seqnum >= 0.9 * cnt THEN x END) as percentile_90
FROM (select t.*,
             row_number() over (partition by id order by x) as seqnum,
             count(*) over (partition by id) as cnt
      from my_table t
     ) t
GROUP BY id
ORDER BY id;

这采用第一个位于第 90 个百分位或更大的值.对此有一些变体可以执行连续版本(取最大值小于或等于,取最小值大于并插值).

This takes the first value that is at the 90th percentile or greater. There are variations on this that can do the continuous version (take the largest value less than or equal to and the smallest one bigger than and interpolate).

这篇关于SQL Server 2008 R2 的百分位数聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆