scipy.stats是否为iqr计算错误? [英] Is scipy.stats doing wrong calculation for iqr?
问题描述
我正在对数据集[23,25,28,28,32,33,35]进行编码
根据 Wiki 和 https://stackoverflow.com/a/23229224 中尝试了另一种方法,结果为6. >
这是我的代码
import numpy as np
from scipy.stats import iqr
x = np.array([23,25,28,28,32,33,35])
print(iqr(x, axis=0))
什么导致了问题?
scipy.stats.iqr
似乎未遵循Wikipedia中记录的递归算法.相反,它只是做np.percentile(x, 75) - np.percentile(x, 25)
这不排除中位数,而是包含在内,因此您得到(32 + 33)/2 - (25 + 28)/2 = 6
如果要在Wikipedia中使用该算法,则需要执行以下操作:
def iqr_(m):
m = np.array(m)
n = m.size//2
m_ = np.partition(m.ravel(), n + 1)
return np.median(m_[n + m.size%2:]) - np.median(m_[:n])
iqr_([23,25,28,28,32,33,35])
8.0
在Wikipedia的对话页面上,提出了该算法所提供的不是确定的,实际上scipy.stats.iqr
的方法也是可以接受的.请参见确定四分位数的三种方法此处
i am coding on a dataset [23,25,28,28,32,33,35]
according to wiki and scipy doc
IQR = Q3 − Q1 = 33 - 25 = 8
when I run IQR on a dataset, the result (6) is not as expected (8).
I tried another method in https://stackoverflow.com/a/23229224, and the result is 6.
here is my code
import numpy as np
from scipy.stats import iqr
x = np.array([23,25,28,28,32,33,35])
print(iqr(x, axis=0))
what leads to the problem?
scipy.stats.iqr
doesn't seem to follow the recursive algorithm documented in Wikipedia. Instead it simply does np.percentile(x, 75) - np.percentile(x, 25)
This is not exclusive of the median, it is inclusive, so you get (32 + 33)/2 - (25 + 28)/2 = 6
If you want to use the algorithm in wikipedia you'd need to do something like:
def iqr_(m):
m = np.array(m)
n = m.size//2
m_ = np.partition(m.ravel(), n + 1)
return np.median(m_[n + m.size%2:]) - np.median(m_[:n])
iqr_([23,25,28,28,32,33,35])
8.0
EDIT: On the talk page of wikipedia it is brought up that the algorithm presented is not definitive, and in fact the method of scipy.stats.iqr
is also acceptable. See the three methods for determining quartiles Here
这篇关于scipy.stats是否为iqr计算错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!