Python Pandas-手动分位数计算 [英] Python Pandas - Quantile calculation manually

查看:1119
本文介绍了Python Pandas-手动分位数计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试手动计算列值的分位数,但是与从Pandas输出的结果进行比较时,无法使用公式手动找到正确的分位数. 我到处寻找不同的解决方案,但找不到正确的答案

I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer

In [54]: df

Out[54]:
    data1   data2       key1    key2
0 -0.204708 1.393406    a       one
1 0.478943  0.092908    a       two
2 1.965781  1.246435    a       one

In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413

使用公式手动查找,由于data1列中有3个值,因此n为3

using the formula to find it manually,n is 3 as there are 3 values in data1 column

quantile(n+1)

应用df1列的值

=0.9(n+1) 
=0.9(4)
= 3.6

所以第3.6位是1.965781,那么熊猫怎么给1.668413?

so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?

推荐答案

函数quantile将根据您的数据范围分配百分比.

The function quantile will assign percentages based on the range of your data.

在您的情况下:

  • -0.204708被认为是第0个百分点,
  • 0.478943被认为是第50个百分位数,并且
  • 1.965781被认为是百分百.

因此,您可以通过以下方式计算第90个百分位数(在第50个百分位数和第100个百分位数之间使用线性插值:

So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:

>>import numpy as np

>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile    
1.6684133999999999

请注意,值0.5和0.4来自以下事实:数据的两点跨越了数据的50%,而0.4代表的数量超过了您希望找到的50%(0.5 + 0.4 = 0.9).希望这是有道理的.

Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.

这篇关于Python Pandas-手动分位数计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆