percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数 [英] Neither percentile_cont nor percentile_disc are calculating the desired 75th percentile in PostgreSQL 9.6.3
问题描述
使用百分位功能,但没有得到所需的输出。我会说不正确,但功能可能按预期工作,而我只是对它们的理解不正确。
Working with the percentile functions, but I am not getting the desired output. I would say "incorrect", but the functions are probably working as they are intended, and I am just not understanding them properly.
这些是我正在使用的数字:
These are the numbers I am working with:
n = 32
160000
202800
240000
250000
265000
280000
285000
300000
300000
300000
300000
300000
309000
325000
350000
358625
364999.92
393750
400000
420000
425000
450000
450000
463500
475000
475000
505808
525000
550000
567300
665000
900000
我对 percentile_cont
的理解是,如果计数为偶数,它将聚合两个数字它将添加它们,然后除以二。我对 percentile_disc
的理解是,如果计数为偶数,它将只选择最低的数字。
My understanding of percentile_cont
is that it will aggregate two numbers if the count is even in that it will add them and then divide by two. My understanding of percentile_disc
is that it will just select the lowest number if the count is even.
这是我的了解使用第50个(中位数)作为示例来计算百分位数的方法:
This is my understanding of calculating a percentile using the 50th (median) as an example:
如果数字(n)为奇数,则在中间选择一个数字;如果数字是偶数,则取中间两个数字的平均值。因此,在这种情况下,有32个数字,因此中位数= (358625 + 364999.92)/ 2 = 361812.46
。 percentile_cont
返回正确的值,因为它将两个值取平均值。 percentile_disc
返回错误的值,因为它选择了两者中的最小值。
If the number of numbers (n) is odd, pick the number in the middle; if the number is even, you average the two numbers in the middle. So in this case, there are 32 numbers, so the median = (358625 + 364999.92) / 2 = 361812.46
. percentile_cont
returns the correct value since it averages the two values; percentile_disc
returns the incorrect value since it picks the lowest of the two.
关于其他百分位,例如第十位,我的理解是将百分位数乘以数字(n)即可得到索引:在这种情况下, .10 * 32 = 3.2索引
。然后应该将您舍入到最接近的整数,这就是您的百分位数值。如果索引是整数,则将索引中的数字与紧随其后的数字进行平均。
Regarding other percentiles, the 10th for example, my understanding is you multiple the percentile by the number of numbers (n) to get the index: .10 * 32 = 3.2 index
in this case. You are then supposed to round up to the nearest whole number and that is your percentile value. If the index is a whole number, then you average the number in the index with the number right after it.
在这种情况下, percentile_cont
是错误的,因为它返回 251500
,这甚至不是我可以到达的数字。我能得到的最接近的平均值是 24000、250000、265000
,即 251666.67
。 percentile_disc
返回正确的结果 250000
。
In that case, percentile_cont
is wrong because it returns 251500
which isn't even a number I can arrive at. The closest I can get is averaging 24000, 250000, 265000
which is 251666.67
. percentile_disc
returns the correct result of 250000
.
但真正的踢球者是这个:第75位。根据我的计算,它应该返回 469250
。 index =(32 * .75)= 24
,该索引应得出(463500 + 475000)= 469250
。 percentile_disc
返回 463500
; percentile_cont
返回 466375
,在我的一生中,我再也无法获得该数字。
But the real kicker is this one: the 75th. It should return 469250
according to my calculations. index = (32*.75) = 24
, and that index should result in (463500 + 475000) = 469250
. percentile_disc
returns 463500
; percentile_cont
returns 466375
, which again I can't arrive at that number for the life of me.
这是我的查询:
SELECT
itemcode,
COUNT(itemcode) AS n,
PERCENTILE_DIST(0.10) WITHIN GROUP (ORDER BY price) AS 10th,
PERCENTILE_DIST(0.25) WITHIN GROUP (ORDER BY price) AS 25th,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY price) AS median,
AVG(price) AS mean,
PERCENTILE_DIST(0.65) WITHIN GROUP (ORDER BY price) AS 65th,
PERCENTILE_DIST(0.75) WITHIN GROUP (ORDER BY price) AS 75th,
PERCENTILE_DIST(0.90) WITHIN GROUP (ORDER BY price) AS 90th
FROM items
WHERE itemcode = 26 AND removed IS NULL
GROUP BY itemcode;
注意:在任何情况下,都没有删除
不是 NULL
。
Note: there are no cases where removed
is not NULL
.
我需要怎么做才能使其正常工作并保持一致?我是否需要编写一个首先检查 n
的函数,然后再确定哪个 percentile_disc
或 percentile_cont
是基于偶数还是奇数?
What do I need to do to get this working correctly and with consistency? Do I need to write a function that checks n
first before to decide which percentile_disc
or percentile_cont
based on whether it is even or odd?
SQL Fiddle: http://sqlfiddle.com/#!17/aa09c/9
SQL Fiddle: http://sqlfiddle.com/#!17/aa09c/9
推荐答案
将此问题发布到Reddit并获得了一些帮助。
Posted this question to Reddit and was able to get some help.
显然, percentile_cont
函数,除了Excel中的 percentile
和 percentile.inc
函数外,还使用线性的C = 1变体进行计算插值,如本Wikipedia中所述:
Apparently, the percentile_cont
function, in addition to percentile
and percentile.inc
functions in Excel, calculate using the C=1 variant of linear interpolation as explained in this Wikipedia:
显然,我
因此PostgreSQL的本机函数不能很好地工作,需要创建一个自定义函数,该函数将在发布时发布我做完。 (我怀疑它将使用9.4之前的旧方法 ntile
,但仍在研究之中)。
So the native functions of PostgreSQL won't work so well and will need to make a custom function which I will post when I am done. (I suspect it will use the old ntile
method from before 9.4, but still looking into it).
但是无论如何,这就是为什么它关闭了。
But anyway, that is why it is off.
这篇关于percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!