SQL排名查询来计算子组中的排名和中位数 [英] SQL ranking query to compute ranks and median in sub groups

查看:847
本文介绍了SQL排名查询来计算子组中的排名和中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算<$ c 中值 code> xy_table :


$ b $ b

  x | y --groups  - > gid | x | y  - 中间人 - > gid | x | y 
------- ------------- -------------
0.1 | 4 0.0 | 0.1 | 4 0.0 | 0.1 | 4
0.2 | 3 0.0 | 0.2 | 3 | |
0.7 | 5 1.0 | 0.7 | 5 1.0 | 0.7 | 5
1.5 | 1 2.0 | 1.5 | 1 | |
1.9 | 6 2.0 | 1.9 | 6 | |
2.1 | 5 2.0 | 2.1 | 5 2.0 | 2.1 | 5
2.7 | 1 3.0 | 2.7 | 1 3.0 | 2.7 | 1

在这个例子中,每个 x 是唯一的该表已经按 x 排序。
现在我想要 GROUP BY round(x),并得到保存 y 中值的元组我可以用这个排名查询来计算整个表的中位数:

  SELECT ax,ay FROM xy_table a,xy_table b 
WHERE ay> = by
GROUP BY ax,ay
HAVING count (*)=(SELECT round((count(*)+ 1)/ 2)FROM xy_table)

输出: 0.1,4.0



但是我还没有成功编写一个查询来计算子组的中位数。



注意:我没有可用的 median()聚合函数。请不要使用特殊的 PARTITION RANK QUANTILE 语句(在相似但供应商特定的 SO问题)。我需要普通的SQL(即与SQLite兼容,而不需要 median()函数)
$ b

编辑:我实际上是在寻找 Medoid ,而不是 Median

解决方案

我建议用您的编程语言进行计算:每个组的

 :每个record_in_group 

将y附加到数组
数组的位数

但是,如果你坚持使用SQLite,按照 y 的顺序对每个组进行排序,然后像这样选择中间的记录 http://sqlfiddle.com/#!5/d4c68/55/0



更新:只有更大的中位数值是重要的,甚至连nr。所以不需要 avg()

  select groups .gid,
ids.y中位数
from(
- 获得每个组中的中间行数(如果行数甚至更大,则数字更大)
- 记录整数除法和模运算符
选择round(x)gid,
count(*)/ 2 + 1 mid_row_right $ b $ from xy_table
group by round(x)
)groups
join(
- 对于每条记录获得相当于
- row_number()的结果(由gid按y划分)
选择round(ax)gid,
斧头,
ay,
count(*)rownr_by_y $ b $ x from xy_table a
左加入xy_table b
圆形(ax)=圆形(bx)
ay> = by
by ax
)ids on ids.gid = groups.gid
其中ids.rownr_by_y = groups.mid_row_right


I want to compute the Median of y in sub groups of this simple xy_table:

  x | y --groups--> gid |   x | y --medians-->  gid |   x | y
-------             -------------               -------------
0.1 | 4             0.0 | 0.1 | 4               0.0 | 0.1 | 4
0.2 | 3             0.0 | 0.2 | 3                   |     |
0.7 | 5             1.0 | 0.7 | 5               1.0 | 0.7 | 5
1.5 | 1             2.0 | 1.5 | 1                   |     |
1.9 | 6             2.0 | 1.9 | 6                   |     |
2.1 | 5             2.0 | 2.1 | 5               2.0 | 2.1 | 5
2.7 | 1             3.0 | 2.7 | 1               3.0 | 2.7 | 1

In this example every x is unique and the table is already sorted by x. I now want to GROUP BY round(x) and get the tuple that holds the median of y in each group.

I can already compute the median for the whole table with this ranking query:

SELECT a.x, a.y FROM xy_table a,xy_table b
WHERE a.y >= b.y
GROUP BY a.x, a.y
HAVING count(*) = (SELECT round((count(*)+1)/2) FROM xy_table)

Output: 0.1, 4.0

But I did not yet succeed writing a query to compute the median for sub groups.

Attention: I do not have a median() aggregation function available. Please also do not propose solutions with special PARTITION, RANK, or QUANTILE statements (as found in similar but too vendor specific SO questions). I need plain SQL (i.e., compatible to SQLite without median() function)

Edit: I was actually looking for the Medoid and not the Median.

解决方案

I suggest doing the computing in your programming language:

for each group:
  for each record_in_group:
    append y to array
  median of array

But if you are stuck with SQLite, you can order each group by y and select the records in the middle like this http://sqlfiddle.com/#!5/d4c68/55/0:

UPDATE: only bigger "median" value is importand for even nr. of rows, so no avg() is needed:

select groups.gid,
  ids.y median
from (
  -- get middle row number in each group (bigger number if even nr. of rows)
  -- note the integer divisions and modulo operator
  select round(x) gid,
    count(*) / 2 + 1 mid_row_right
  from xy_table
  group by round(x)
) groups
join (
  -- for each record get equivalent of
  -- row_number() over(partition by gid order by y)
  select round(a.x) gid,
    a.x,
    a.y,
    count(*) rownr_by_y
  from xy_table a
  left join xy_table b
    on round(a.x) = round (b.x)
    and a.y >= b.y
  group by a.x
) ids on ids.gid = groups.gid
where ids.rownr_by_y = groups.mid_row_right

这篇关于SQL排名查询来计算子组中的排名和中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆