Spark(Python)中的Kolmogorov Smirnov测试无法正常工作? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

查看：177 发布时间：2020/7/24 4:57:01 python pyspark apache-spark-mllib kolmogorov-smirnov

本文介绍了Spark(Python)中的Kolmogorov Smirnov测试无法正常工作?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python spark-ml中进行正常性测试，发现我认为是一个错误.

I was doing a normality test in Python spark-ml and saw what I think is an bug.

这是设置，我有一个标准化的数据集(范围-1，到1).

Here is the setup, i have a data-set that is normalized (range -1, to 1).

当我做直方图时，我可以清楚地看到数据不正常:

When I do a histogram, i can clearly see that the data is NOT normal:

>>> prices_norm.histogram(10)

([-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
 [226, 269, 119, 95, 52, 26, 8, 2, 2, 5])

当我运行Kolmgorov-Smirnov测试时，得到以下结果:

When I run the Kolmgorov-Smirnov test I get the following results:

>>> testResults = Statistics.kolmogorovSmirnovTest(prices_norm, "norm")
>>> print testResults

Kolmogorov-Smirnov test summary:
degrees of freedom = 0 
statistic = 0.46231145770077375 
pValue = 1.742039845709087E-11 
Very strong presumption against null hypothesis: Sample follows theoretical distribution.

Kolmgorov-Smirnov检验将无效假设(H0)定义为:数据遵循指定的分布(

The Kolmgorov-Smirnov test defines the null hypothesis (H0) as: the data follows a specified distribution (http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm).

在这种情况下，p值非常低，因此我们应该拒绝原假设.这是有道理的，因为这显然是不正常的.

In this case the p-value is very low, so we should reject the null hypothesis. This makes sense, as it is clearly not normal.

那为什么会这样说:

Sample follows theoretical distribution

这不是错吗?难道不是说样本不遵循理论分布吗?我想念什么吗?

Isn't this wrong? Shouldn't it say that the sample does NOT follow a theoretical distribution? Am I missing something?

推荐答案

这使我发疯，所以我直接看了一下源代码:

This was driving me crazy, so I went to look at the source code directly:

git://git.apache.org/spark.git
spark/mllib/src/main/scala/org/apache/spark/mllib/stat/test/KolmogorovSmirnovTest.scala

代码为正确，空假设设置为:

The code is correct, the null Hypothesis is set as:

object NullHypothesis extends Enumeration {
  type NullHypothesis = Value
  val OneSampleTwoSided = Value("Sample follows theoretical distribution")
}

字符串消息的修饰语重申了原假设:

Very strong presumption against null hypothesis: Sample follows theoretical distribution.
                                                 ________________________________________
                                                                    H0

可以说，这种混淆很容易混淆，因为它可以用两种方式来解释.但这确实是正确的.

Arguably the verbiage is confusing as it could be interpreted both ways. But it is indeed correct.

这篇关于Spark(Python)中的Kolmogorov Smirnov测试无法正常工作?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark(Python)中的Kolmogorov Smirnov测试无法正常工作? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Spark(Python)中的Kolmogorov Smirnov测试无法正常工作? [英] Kolmogorov Smirnov Test in Spark (Python) not working?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭