python中分布的正态性检验 [英] normality test of a distribution in python

查看:34
本文介绍了python中分布的正态性检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些从雷达卫星图像中采样的数据,想对其进行一些统计测试.在此之前,我想进行正态性测试,以便确定我的数据是正态分布的.我的数据似乎是正态分布的,但是当我执行测试时,我得到的 Pvalue 为 0,这表明我的数据不是正态分布的.

I have some data I have sampled from a radar satellite image and wanted to perform some statistical tests on. Before this I wanted to conduct a normality test so I could be sure my data was normally distributed. My data appears to be normally distributed but when I perform the test Im getting a Pvalue of 0, suggesting my data is not normally distributed.

我已经附上了我的代码以及输出和分布的直方图(我对 python 相对较新,如果我的代码在任何方面都很笨拙,我深表歉意).谁能告诉我我做错了什么 - 我发现很难从直方图中相信我的数据不是正态分布的?

I have attached my code along with the output and a histogram of the distribution (Im relatively new to python so apologies if my code is clunky in any way). Can anyone tell me if Im doing something wrong - I find it hard to believe from my histogram that my data is not normally distributed?

values = 'inputfile.h5'
f = h5py.File(values,'r')
dset = f['/DATA/DATA']
array = dset[...,0]
print('normality =', scipy.stats.normaltest(array))
max = np.amax(array)
min = np.amin(array)

histo = np.histogram(array, bins=100, range=(min, max))
freqs = histo[0]
rangebins = (max - min)
numberbins = (len(histo[1])-1)
interval = (rangebins/numberbins)
newbins = np.arange((min), (max), interval)
histogram = bar(newbins, freqs, width=0.2, color='gray')
plt.show()

这会打印:(41099.095955202931, 0.0).第一个元素是卡方值,第二个元素是 pvalue.

This prints this: (41099.095955202931, 0.0). the first element is a chi-square value and the second is a pvalue.

我制作了我所附数据的图表.我想可能是因为我在处理负值时它导致了问题,所以我对值进行了标准化,但问题仍然存在.

I have made a graph of the data which I have attached. I thought that maybe as Im dealing with negative values it was causing a problem so I normalised the values but the problem persists.

推荐答案

这个问题 解释了为什么您得到如此小的 p 值.从本质上讲,正态性检验几乎总是在非常大的样本量上拒绝空值(例如,在您的样本中,您只能看到左侧有一些偏斜,这在您的大样本量下已经绰绰有余了).

This question explains why you're getting such a small p-value. Essentially, normality tests almost always reject the null on very large sample sizes (in yours, for example, you can see just some skew in the left side, which at your enormous sample size is way more than enough).

在您的情况下更实用的是绘制适合您数据的正态曲线.然后您可以看到正态曲线实际上有何不同(例如,您可以看到左侧的尾巴是否确实走得太长).例如:

What would be much more practically useful in your case is to plot a normal curve fit to your data. Then you can see how the normal curve actually differs (for example, you can see whether the tail on the left side does indeed go too long). For example:

from matplotlib import pyplot as plt
import matplotlib.mlab as mlab

n, bins, patches = plt.hist(array, 50, normed=1)
mu = np.mean(array)
sigma = np.std(array)
plt.plot(bins, mlab.normpdf(bins, mu, sigma))

(注意 normed=1 参数:这确保直方图被归一化为总面积为 1,这使其与正态分布等密度相当).

(Note the normed=1 argument: this ensures that the histogram is normalized to have a total area of 1, which makes it comparable to a density like the normal distribution).

这篇关于python中分布的正态性检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆