将正态分布拟合到加权列表 [英] Fit normal distribution to weighted list
问题描述
我有一堆数据点,我想对数据拟合正态分布.我看到 scipy 有 stats.norm.fit
方法,但这需要一个数据点列表.类似的东西
I have a bunch of data points, and I would like to fit a Normal distribution to the data. I see that scipy has the stats.norm.fit
method, but this requires a single list of data points. Something like
data = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5]
而我的数据包含在两个列表中,例如.
whereas my data is contained in two lists, something like.
values = [1, 2, 3, 4, 5]
counts = [4, 3, 6, 1, 3]
如何将正态分布拟合到以这种方式格式化的数据中?
How can I fit a normal distribution to data formatted in this way?
推荐答案
您可以使用 numpy.repeat
,并使用scipy.stats.norm.fit
:
In [54]: import numpy as np
In [55]: from scipy.stats import norm
In [56]: values = [1, 2, 3, 4, 5]
In [57]: counts = [4, 3, 6, 1, 3]
In [58]: full_values = np.repeat(values, counts)
In [59]: full_values
Out[59]: array([1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5])
In [60]: norm.fit(full_values) # Estimate mean and std. dev.
Out[60]: (2.7647058823529411, 1.3516617991854185)
scipy.stats.norm.fit
计算参数的最大似然估计.对于正态分布,这些只是样本均值和(有偏的)样本方差的平方根.据我所知,numpy 或 scipy 中唯一相关的加权统计函数是 numpy.average
.您可以使用 numpy.average
自己进行计算,使用 counts
作为 weights
参数.
scipy.stats.norm.fit
computes the maximum likelihood estimates of the parameters. For the normal distribution, these are just the sample mean and the square root of the (biased) sample variance.
As far as I know, the only relevant weighted statistical function in numpy or scipy is numpy.average
. You can do the computation yourself with numpy.average
, using counts
as the weights
argument.
In [62]: sample_mean = np.average(values, weights=counts)
In [63]: sample_mean
Out[63]: 2.7647058823529411
In [64]: sample_var = np.average((values - sample_mean)**2, weights=counts)
In [65]: sample_var
Out[65]: 1.8269896193771626
In [66]: sample_std = np.sqrt(sample_var)
In [67]: sample_std
Out[67]: 1.3516617991854185
请注意,statistics.stdev
基于 无偏 样本方差.如果这是您想要的,您可以通过将有偏差的样本方差乘以 sum(counts)/(sum(counts) - 1)
来调整缩放比例:
Note that statistics.stdev
is based on the unbiased sample variance. If that's what you want, you can adjust the scaling by multiplying the biased sample variance by sum(counts)/(sum(counts) - 1)
:
In [79]: n = sum(counts)
In [80]: sample_var = n/(n-1)*np.average((values - sample_mean)**2, weights=counts)
In [81]: sample_var
Out[81]: 1.9411764705882353
In [82]: sample_std = np.sqrt(sample_var)
In [83]: sample_std
Out[83]: 1.3932610920384718
这篇关于将正态分布拟合到加权列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!