Pearson 的卡方测试 Python [英] Pearson's Chi Square Test Python

查看:69
本文介绍了Pearson 的卡方测试 Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数组想要进行 Pearson 卡方检验(拟合优度).我想测试预期结果和观察结果之间是否存在显着差异.

观察到 = [11294, 11830, 10820, 12875]预期 = [10749, 10940, 10271, 11937]

我想比较 11294 与 10749、11830 与 10940、10820 与 10271 等

这是我所拥有的

<预><代码>>>>从 scipy.stats 导入卡方>>>chisquare(f_obs=[11294, 11830, 10820, 12875],f_exp=[10749, 10940, 10271, 11937])(203.08897607453906, 9.0718379533890424e-44)

其中 203 是卡方检验统计量,9.07e-44 是 p 值.我对结果感到困惑.p 值 = 9.07e-44 <0.05 因此我们拒绝原假设并得出观察结果和预期结果之间存在显着差异的结论.这是不正确的,因为数字非常接近.我该如何解决这个问题?

解决方案

一般来说,原假设(H0)表示两个变量(X 和 Y)是独立的,即改变 X 中的值不会影响是的.

例如,X = [1,2,3,4] 和 Y = [2,4,6,8]

如果您使用任何方法计算这种情况下的p 值",它应该是一个非常小的值,这意味着在零假设之后出现这种情况的可能性非常低,即X 和 Y 相互独立的可能性非常低.

这意味着它永远不会遵循这里的 Null Hypothesis,并且这两个变量以 Y = 2X 的形式相互依赖.

同样,在您的情况下,9.0718379533890424e-44 的 p 值得分表示相同的事情,即小值表示它满足零假设的可能性非常低,这意味着观察到的预期是相互关联的,它们之间没有独立.

附言.你是对的.

I have two arrays that I would like to do a Pearson's Chi Square test (goodness of fit). I want to test whether or not there is a significant difference between the expected and observed results.

observed = [11294, 11830, 10820, 12875]
expected = [10749, 10940, 10271, 11937]

I want to compare 11294 with 10749, 11830 with 10940, 10820 with 10271, etc.

Here's what I have

>>> from scipy.stats import chisquare
>>> chisquare(f_obs=[11294, 11830, 10820, 12875],f_exp=[10749, 10940, 10271, 11937])
(203.08897607453906, 9.0718379533890424e-44)

where 203 is the chi square test statistic and 9.07e-44 is the p value. I'm confused by the results. p-value = 9.07e-44 < 0.05 therefore we reject the null hypothesis and conclude that there is a significant difference between the observed and expected results. This isn't correct because the numbers are so close. How do I fix this?

解决方案

In general, the null hypothesis(H0) says that the two variable(X and Y) are independent, i.e. changing values in X wouldn't affect values in Y.

For example, X = [1,2,3,4] and Y = [2,4,6,8]

If you calculate the "p-value" using any method out there for this case, it should come out to be a very small value, implying that there is a very low chance of this case following the null hypothesis, i.e. a very low chance that X and Y are independent of each other.

It means it will never follow the Null Hypothesis here and these two variables are dependent on each other, in a form of Y = 2X.

In your case also, p-value score of 9.0718379533890424e-44 means the same thing, i.e. small value indicates that there is a very low chance it would suffice the null hypothesis and it means that observed and expected are related to each other and there is no independence between them.

Ps. You are correct about this.

这篇关于Pearson 的卡方测试 Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆