用于 Python 拟合优度的 Kolmogorov Smirnov 检验 [英] Kolmogorov Smirnov test for the fitting goodness in python

查看:37
本文介绍了用于 Python 拟合优度的 Kolmogorov Smirnov 检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试拟合分布.配件已完成,但我需要测量,以选择最佳型号.许多论文都在使用 Kolomogorov-Smirnov (KS) 测试.我试图实现这一点,但得到的 p 值结果非常低.

i am trying to fit distributions. The fitting is finished, but i need a measurement, to choose the best model. Many papers are using the Kolomogorov-Smirnov (KS) test. I tried to implement that, and i am getting very low p-value results.

#Histigram plot

binwidth = np.arange(0,int(out_threshold1),1)    
n1, bins1, patches = plt.hist(h1, bins=binwidth, normed=1, facecolor='#023d6b', alpha=0.5, histtype='bar')

#Fitting

gevfit4 = gev.fit(h1)  
pdf_gev4 = gev.pdf(lnspc, *gevfit4)   
plt.plot(lnspc, pdf_gev4, label="GEV")

logfit4 = stats.lognorm.fit(h)  
pdf_lognorm4 = stats.lognorm.pdf(lnspc, *logfit4)  
plt.plot(lnspc, pdf_lognorm4, label="LogNormal")

weibfit4 = stats.weibull_min.fit(h1)  
pdf_weib4 = stats.weibull_min.pdf(lnspc, *weibfit4)  
plt.plot(lnspc, pdf_weib4, label="Weibull")

burr12fit4 = stats.burr12.fit(h1)  
pdf_burr124 = stats.burr12.pdf(lnspc, *burr12fit4)  
plt.plot(lnspc, pdf_burr124, label="Burr")

genparetofit4 = stats.genpareto.fit(h1)
pdf_genpareto4 = stats.genpareto.pdf(lnspc, *genparetofit4)
plt.plot(lnspc, pdf_genpareto4, label ="Gen-Pareto")

#KS-Test
print(stats.kstest(h1, lambda k : stats.genpareto.cdf(k, *genparetofit),args=(),N=len(h1),alternative ='two-sided', mode ='approx'))
print(stats.kstest(h1, lambda k : stats.lognorm.cdf(k, *logfit),args=(),N=len(h1),alternative ='two-sided', mode ='approx'))
print(stats.kstest(h1, lambda k : gev.cdf(k, *gevfit),args=(),N=len(h1),alternative ='two-sided', mode ='approx'))
print(stats.kstest(h1, lambda k : stats.weibull_min.cdf(k, *weibfit),args=(),N=len(h1),alternative ='two-sided', mode ='approx'))
print(stats.kstest(h1, lambda k : stats.burr12.cdf(k, *burr12fit),args=(),N=len(h1),alternative ='two-sided', mode ='approx'))

运行后,我得到如下值:

After this runs, I get values like:

KstestResult(statistic=0.065689774346523788, pvalue=2.3778862070128568e-20)
KstestResult(statistic=0.063434691987405312, pvalue=5.2567851875784095e-19)
KstestResult(statistic=0.065047355887551062, pvalue=5.8076254324909468e-20)
KstestResult(statistic=0.25249534411299968, pvalue=8.3670183092211739e-295)
KstestResult(statistic=0.068528435880779559, pvalue=4.1395594967775799e-22)

这些值合理吗?仍然可以选择最佳模型吗?是拟合最好的模型,即统计值最小的模型吗?

Are these values reasonable? Is it still possible to chose the best model? Is the best fitted model, the model with the smallest statistic value?

我绘制了两个拟合分布的 CDF.

I plotted the CDFs for two fitted distribution.

它们看起来非常合身.但我仍然得到那些小的 p 值.

They seem very well fitted. But I still get those small p-values.

推荐答案

kstest 的 p 值假设分布的参数是已知的.当估计参数时,它们是不合适的.但是,据我所知,在这种情况下,p 值应该太大,而在这里它们非常小.

The p-values for kstest assumes that the parameters of the distribution are known. They are not appropriate when parameters are estimated. However, as far as I understand, the p-values should be too large in that case, while here they are very small.

从直方图看来,有些区域与任何分布都没有很好地匹配.此外,数据中可能存在一些四舍五入或一些离散值的聚集.

From the histogram plot it looks like that there are some regions that are not well matched by any of the distributions. Additionally, there might be some rounding in the data or bunching at some discrete values.

如果样本量足够大,那么与假设分布的任何微小偏差都会导致拒绝该分布与数据匹配的假设.

If the sample size is large enough, then any small deviations from the hypothesized distribution will result in a rejection of the hypothesis that the distribution matches the data.

要使用 ks-test 作为选择标准,我们可以查看 ks-statistic 或 p-values 并选择最匹配的一个,在这种情况下,对数正态.我们将在测试的集合中获得最佳拟合分布,但它在一定程度上偏离了生成数据的真实"分布.

To use ks-test as a selection criterion, we can just look at the ks-statistic or p-values and choose the one that matches best, in this case log-normal. We would get the best fitting distribution among the set tested, but it deviates to some extent from the "true" distribution that generated the data.

这篇关于用于 Python 拟合优度的 Kolmogorov Smirnov 检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆