Python Pandas-描述函数如何计算25% [英] Python Pandas - how is 25 percentile calculated by describe function

查看:728
本文介绍了Python Pandas-描述函数如何计算25%的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于数据框中的给定数据集,当我应用describe函数时,会获得基本统计信息,包括最小值,最大值,25%,50%等.

For a given dataset in a data frame, when I apply the describe function, I get the basic stats which include min, max, 25%, 50% etc.

例如:

data_1 = pd.DataFrame({'One':[4,6,8,10]},columns=['One'])
data_1.describe()

输出为:

        One
count   4.000000
mean    7.000000
std     2.581989
min     4.000000
25%     5.500000
50%     7.000000
75%     8.500000
max     10.000000

我的问题是:计算25%的数学公式是什么?

My question is: What is the mathematical formula to calculate the 25%?

1)根据我的了解,它是:

1) Based on what I know, it is:

formula = percentile * n (n is number of values)

在这种情况下:

25/100 * 4 = 1

第一个位置是数字4,但根据describe函数,它是5.5.

So the first position is number 4 but according to the describe function it is 5.5.

2)另一个示例说-如果得到一个整数,则取4和6的平均值-等于5-仍然与describe给定的5.5不匹配.

2) Another example says - if you get a whole number then take the average of 4 and 6 - which would be 5 - still does not match 5.5 given by describe.

3)另一个教程说-您将两个数字之间的差值乘以25%,然后加到较低的数字上:

3) Another tutorial says - you take the difference between the 2 numbers - multiply by 25% and add to the lower number:

25/100 * (6-4) = 1/4*2 = 0.5

将其添加到较低的数字中:4 + 0.5 = 4.5

Adding that to the lower number: 4 + 0.5 = 4.5

仍然没有得到5.5.

有人可以澄清吗?

推荐答案

In the pandas documentation there is information about the computation of quantiles, where a reference to numpy.percentile is made:

以给定的分位数la numpy.percentile返回值.

Return value at the given quantile, a la numpy.percentile.

然后,检查numpy.percentile 解释 ,我们可以看到插值方法默认设置为 linear :

Then, checking numpy.percentile explanation, we can see that the interpolation method is set to linear by default:

线性:i +(j-i)*分数,其中分数是分数部分 被i和j包围的索引中的

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j

对于您特殊的情况,第25个分位数来自:

For your specfic case, the 25th quantile results from:

res_25 = 4 + (6-4)*(3/4) =  5.5

对于第75分位数,我们得到:

For the 75th quantile we then get:

res_75 = 8 + (10-8)*(1/4) = 8.5

如果将插值方法设置为中点",则将获得您认为的结果.

If you set the interpolation method to "midpoint", then you will get the results that you thought of.

.

这篇关于Python Pandas-描述函数如何计算25%的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆