Python Pandas-手动分位数计算 [英] Python Pandas - Quantile calculation manually
问题描述
我正在尝试手动计算列值的分位数,但是与从Pandas输出的结果进行比较时,无法使用公式手动找到正确的分位数. 我到处寻找不同的解决方案,但找不到正确的答案
I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer
In [54]: df
Out[54]:
data1 data2 key1 key2
0 -0.204708 1.393406 a one
1 0.478943 0.092908 a two
2 1.965781 1.246435 a one
In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413
使用公式手动查找,由于data1列中有3个值,因此n为3
using the formula to find it manually,n is 3 as there are 3 values in data1 column
quantile(n+1)
应用df1列的值
=0.9(n+1)
=0.9(4)
= 3.6
所以第3.6位是1.965781,那么熊猫怎么给1.668413?
so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?
推荐答案
函数quantile
将根据您的数据范围分配百分比.
The function quantile
will assign percentages based on the range of your data.
在您的情况下:
- -0.204708被认为是第0个百分点,
- 0.478943被认为是第50个百分位数,并且
- 1.965781被认为是百分百.
因此,您可以通过以下方式计算第90个百分位数(在第50个百分位数和第100个百分位数之间使用线性插值:
So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:
>>import numpy as np
>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile
1.6684133999999999
请注意,值0.5和0.4来自以下事实:数据的两点跨越了数据的50%,而0.4代表的数量超过了您希望找到的50%(0.5 + 0.4 = 0.9).希望这是有道理的.
Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.
这篇关于Python Pandas-手动分位数计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!