使用频率、分档、CDF、Python 的卡方测试 [英] Chi Square Test using Frequencies, Bins, CDF, Python

查看:35
本文介绍了使用频率、分档、CDF、Python 的卡方测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从头开始为 Beta 分布编写卡方拟合优度测试,而不使用任何外部函数.即使 scipy.stats 中的 kstest 返回零,下面的代码也会报告1"表示拟合.数据是正常分布的,所以我的函数也应该返回零.

I am trying to write a chi square goodness-of-fit test for Beta distribution from scratch, without using any external functions. The code below reports '1' for a fit, even though kstest from scipy.stats returns a zero. Data is distributed normally, so my function should also return zero.

import numpy as np
from scipy.stats import chi2
from scipy.stats import beta
from scipy.stats import kstest
from scipy.stats import norm

preds = norm.rvs(5,2,size=200)
preds.sort()

bin_size = 30
bins = np.linspace(0,10,bin_size)
counts = np.digitize(preds, bins)
mean = 5
var = 2

sum = 0
for i in range(len(bins)-1):
    p = beta.cdf(bins[i+1], mean, var) - beta.cdf(bins[i], mean, var)  
    freq = len(counts[counts==i]) / float(len(counts))    
    sum = sum + ((freq - p)**2)/p

dof = len(counts)-2
pval = 1 - chi2.cdf(sum, dof)
print pval

在代码中,我创建了 bin,根据 bin 测量频率,使用 Beta 分布 CDF 计算预期频率,并将其相加得到 X^2 测试统计量.

In the code, I create bins, measure frequencies based on the bins, calculate expected frequency using Beta distribution CDF, and sum it up resulting in the X^2 test statistic.

kstest 调用是

The kstest call is

print kstest(preds, 'beta', [mean, var])

我在这里做错了什么?

谢谢,

推荐答案

问题在于 DOF 定义:

Problem was with the DOF definition:

dof = len(preds)-2

dof = len(preds)-2

是正确的选择.此外,我必须将 bin 大小减少到 15,以获得一致的0"结果.众所周知,Chi^2 测试对 bin 大小很敏感.

is the correct choice. Also, I had to reduce bin size to 15 in order to get consistent '0' result. It is known that Chi^2 tests are sensitive on bin size.

这篇关于使用频率、分档、CDF、Python 的卡方测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆