直方图数据中的百分位数 [英] percentiles from histogram data
问题描述
下表捕获了许多考试中的学生成绩数据。
The following table captures student grade data over a number of exams.
CREATE TABLE grades
AS
SELECT name, exams, grade_poor, grade_fair, grade_good, grade_vgood
FROM ( VALUES
( 'arun' , 8 , 1 , 4 , 2 , 1 ),
( 'neha' , 10 , 3 , 2 , 1 , 4 ),
( 'ram' , 5 , 1 , 1 , 3 , 0 ),
( 'radha' , 8 , 0 , 3 , 1 , 4 )
) AS t(name,exams,grade_poor,grade_fair,grade_good,grade_vgood);
等级排序的意义是vgood> good> fair>贫穷
the grades are ordered in the sense that vgood > good > fair > poor
是否有可能(或有道理)使用此数据为每个学生找到第50个百分位?例如,如果学生姓名为 arun
,如果我们将数据视为一系列成绩类别,则第50个百分位将为 grade_fair
。
would it be possible ( or would it make sense) to find the 50th percentile grade for each student with this data ? For example - in case of student name arun
if we think of the data as a series of grade categories - the 50th percentile would be grade_fair
.
推荐答案
首先,您需要取消此项。我们可以这样做...
First you need to unpivot this. We can do that like this...
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
name | array
-------+-----------
arun | {1,4,2,1}
neha | {3,2,1,4}
ram | {1,1,3,0}
radha | {0,3,1,4}
然后我们需要对成绩进行索引...与 CROSS JOIN LATERAL
。我们有4行,每行有4个数组。我们要4 * 4行。
Then we need to index into grades... We do that with a CROSS JOIN LATERAL
. We have 4 rows with an array of 4. We want 4*4 rows.
SELECT name, grades, gs1.x, grades[gs1.x] AS gradeqty
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
ORDER BY name, x;
name | grades | x | gradeqty
-------+-----------+---+----------
arun | {1,4,2,1} | 1 | 1
arun | {1,4,2,1} | 2 | 4
arun | {1,4,2,1} | 3 | 2
arun | {1,4,2,1} | 4 | 1
neha | {3,2,1,4} | 1 | 3
neha | {3,2,1,4} | 2 | 2
neha | {3,2,1,4} | 3 | 1
neha | {3,2,1,4} | 4 | 4
radha | {0,3,1,4} | 1 | 0
radha | {0,3,1,4} | 2 | 3
radha | {0,3,1,4} | 3 | 1
radha | {0,3,1,4} | 4 | 4
ram | {1,1,3,0} | 1 | 1
ram | {1,1,3,0} | 2 | 1
ram | {1,1,3,0} | 3 | 3
ram | {1,1,3,0} | 4 | 0
(16 rows)
现在剩下的是,我们需要再次交叉加入横向
以重现x(我们的成绩),超过等级
Now what remains, is we need to CROSS JOIN LATERAL
again to reproduce x (our grade), over gradeqty
SELECT name,
gs1.x
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
ORDER BY name, gs1.x;
name | x
-------+---
arun | 1
arun | 2
arun | 2
arun | 2
arun | 2
arun | 3
arun | 3
arun | 4
neha | 1
neha | 1
neha | 1
neha | 2
neha | 2
neha | 3
neha | 4
neha | 4
neha | 4
neha | 4
radha | 2
radha | 2
radha | 2
radha | 3
radha | 4
radha | 4
radha | 4
radha | 4
ram | 1
ram | 2
ram | 3
ram | 3
ram | 3
(31 rows)
现在我们按名称分组
,然后我们使用有序集合聚合函数 percent_disc
完成工作。
Now we GROUP BY name
and then we use an Ordered-Set Aggregate Functions percent_disc
to finish the job..
SELECT name, percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name ORDER BY name;
name | percentile_disc
-------+-----------------
arun | 2
neha | 2
radha | 3
ram | 3
(4 rows)
想进一步介绍它并使它漂亮。 。
Want to go into it further and make it pretty...
SELECT name, (ARRAY['Poor', 'Fair', 'Good', 'Very Good'])[percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)]
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name
ORDER BY name;
name | array
-------+-------
arun | Fair
neha | Fair
radha | Good
ram | Good
(4 rows)
如果我们千斤顶,我们可以得到稍微更多些的输出
We can get a slightly more varied out put if we jack up a new user.
INSERT INTO grades (name,grade_poor,grade_fair,grade_good,grade_vgood)
VALUES ('Bob', 0,0,0,100);
name | array
-------+-----------
arun | Fair
Bob | Very Good
neha | Fair
radha | Good
ram | Good
(5 rows)
这篇关于直方图数据中的百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!