Python Kolmogorov-Smirnov拟合优度检验中的p值非常低 [英] Very low p-values in Python Kolmogorov-Smirnov Goodness of Fit Test
问题描述
我有一组数据,并通过对数正态分布拟合相应的直方图.我首先计算对数正态函数的最佳参数,然后绘制直方图和对数正态函数.这样可以得到很好的结果:
I have a set of data and fit the corresponding histogram by a lognormal distribution. I first calculate the optimal parameters for the lognormal function, and then plot the histogram and the lognormal function. This gives quite good results:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
num_data = len(data)
x_axis = np.linspace(min(data),
max(data),num_data)
number_of_bins = 240
histo, bin_edges = np.histogram(data, number_of_bins, normed=False)
shape, location, scale = sp.stats.lognorm.fit(data)
plt.hist(data, number_of_bins, normed=False);
# the scaling factor scales the normalized lognormal function up to the size
# of the histogram:
scaling_factor = len(data)*(max(data)-min(data))/number_of_bins
plt.plot(x_axis,scaling_factor*sp.stats.lognorm.pdf(x_axis, shape,
location, scale),'r-')
# adjust the axes dimensions:
plt.axis([bin_edges[0]-10,bin_edges[len(bin_edges)-1]+10,0, histo.max()*1.1])
但是,当对数据与拟合函数进行Kolmogorov-Smirnov测试时,我发现p值太低(大约为e-32):
However, when performing the Kolmogorov-Smirnov test on the data versus the fitting function, I get way too low p-values (of the order of e-32):
lognormal_ks_statistic, lognormal_ks_pvalue =
sp.stats.kstest(
data,
lambda k: sp.stats.lognorm.cdf(k, shape, location, scale),
args=(),
N=len(data),
alternative='two-sided',
mode='approx')
print(lognormal_ks_statistic)
print(lognormal_ks_pvalue)
这是不正常的,因为从图中可以看出拟合非常准确...有人知道我在哪里犯了错误吗?
This is not normal, since we see from the plot that the fitting is quite accurate... does anybody know where I made a mistake?
非常感谢!!查尔斯
推荐答案
这只是意味着您的数据并不完全是对数常态.根据直方图,您可以使用许多数据点进行K-S测试.这意味着,如果您的数据与基于具有这些参数的对数正态分布的期望值相比略有差异,则K-S测试将表明该数据并非来自对数正态.
This simply means that your data isn't exactly log-normal. Based on the histogram, you have a lot of data points for the K-S test to use. This means that if your data is evenly slightly different than would be expected based on a log-normal distribution with those parameters, the K-S test will indicate the data isn't drawn from log-normal.
数据来自哪里?如果它是来自有机数据源,或者是从对数正态分布中专门绘制随机数以外的任何其他数据源,那么即使拟合度看起来很好,我也希望p值非常小.只要适合您的目的,这当然不是问题.
Where is the data from? If it is from an organic source, or any source other than specifically drawing random numbers from a lognormal distribution, I would expect an extremely small p-value, even if the fits looks great. This certainly isn't a problem though as long as the fit is sufficiently good for your purposes.
这篇关于Python Kolmogorov-Smirnov拟合优度检验中的p值非常低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!