Spark (Python) 中的 Kolmogorov Smirnov 测试不起作用? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

查看：31 发布时间：2021/11/14 21:09:07 python pyspark apache-spark-mllib kolmogorov-smirnov

本文介绍了Spark (Python) 中的 Kolmogorov Smirnov 测试不起作用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 Python spark-ml 中进行了正态性测试，发现我认为是一个错误.

这是设置，我有一个标准化的数据集(范围 -1，到 1).

当我做直方图时，我可以清楚地看到数据不正常:

<预><代码>>>>price_norm.histogram(10)([-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0],[226, 269, 119, 95, 52, 26, 8, 2, 2, 5])

当我运行 Kolmgorov-Smirnov 测试时，我得到以下结果:

<预><代码>>>>testResults = Statistics.kolmogorovSmirnovTest(prices_norm, "norm")>>>打印测试结果Kolmogorov-Smirnov 测试总结:自由度 = 0统计数据 = 0.46231145770077375pValue = 1.742039845709087E-11反对零假设的非常强的假设:样本遵循理论分布.

Kolmgorov-Smirnov 检验将零假设 (H0) 定义为:数据遵循指定的分布(http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm).

在这种情况下，p 值非常低，因此我们应该拒绝原假设.这是有道理的，因为这显然不正常.

那为什么会说:

样本遵循理论分布

这不是错了吗?难道不应该说样本不遵循理论分布吗?我错过了什么吗?

解决方案

把我逼疯了，直接去看源码:

git://git.apache.org/spark.gitspark/mllib/src/main/scala/org/apache/spark/mllib/stat/test/KolmogorovSmirnovTest.scala

代码正确，零假设设置为:

object NullHypothesis extends Enumeration {输入 NullHypothesis = 值val OneSampleTwoSided = Value("样本服从理论分布")}

字符串消息的措辞只是重申零假设:

非常强的反对零假设的假设:样本遵循理论分布.________________________________________H0

可以说这个措辞令人困惑，因为它可以双向解释.但这确实是正确的.

I was doing a normality test in Python spark-ml and saw what I think is an bug.

Here is the setup, i have a data-set that is normalized (range -1, to 1).

When I do a histogram, i can clearly see that the data is NOT normal:

>>> prices_norm.histogram(10)

([-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
 [226, 269, 119, 95, 52, 26, 8, 2, 2, 5])

When I run the Kolmgorov-Smirnov test I get the following results:

>>> testResults = Statistics.kolmogorovSmirnovTest(prices_norm, "norm")
>>> print testResults

Kolmogorov-Smirnov test summary:
degrees of freedom = 0 
statistic = 0.46231145770077375 
pValue = 1.742039845709087E-11 
Very strong presumption against null hypothesis: Sample follows theoretical distribution.

The Kolmgorov-Smirnov test defines the null hypothesis (H0) as: the data follows a specified distribution (http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm).

In this case the p-value is very low, so we should reject the null hypothesis. This makes sense, as it is clearly not normal.

So why then, does it say:

Sample follows theoretical distribution

Isn't this wrong? Shouldn't it say that the sample does NOT follow a theoretical distribution? Am I missing something?

解决方案

This was driving me crazy, so I went to look at the source code directly:

git://git.apache.org/spark.git
spark/mllib/src/main/scala/org/apache/spark/mllib/stat/test/KolmogorovSmirnovTest.scala

The code is correct, the null Hypothesis is set as:

object NullHypothesis extends Enumeration {
  type NullHypothesis = Value
  val OneSampleTwoSided = Value("Sample follows theoretical distribution")
}

The verbiage of the string message is just restating the null hypothesis:

Very strong presumption against null hypothesis: Sample follows theoretical distribution.
                                                 ________________________________________
                                                                    H0

Arguably the verbiage is confusing as it could be interpreted both ways. But it is indeed correct.

这篇关于Spark (Python) 中的 Kolmogorov Smirnov 测试不起作用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark (Python) 中的 Kolmogorov Smirnov 测试不起作用? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Spark (Python) 中的 Kolmogorov Smirnov 测试不起作用? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭