将正态分布拟合到加权列表 [英] Fit normal distribution to weighted list

查看:50
本文介绍了将正态分布拟合到加权列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆数据点,我想对数据拟合正态分布.我看到 scipy 有 stats.norm.fit 方法,但这需要一个数据点列表.类似的东西

I have a bunch of data points, and I would like to fit a Normal distribution to the data. I see that scipy has the stats.norm.fit method, but this requires a single list of data points. Something like

data = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5]

而我的数据包含在两个列表中,例如.

whereas my data is contained in two lists, something like.

values = [1, 2, 3, 4, 5]
counts = [4, 3, 6, 1, 3]

如何将正态分布拟合到以这种方式格式化的数据中?

How can I fit a normal distribution to data formatted in this way?

推荐答案

您可以使用 numpy.repeat,并使用scipy.stats.norm.fit:

In [54]: import numpy as np

In [55]: from scipy.stats import norm

In [56]: values = [1, 2, 3, 4, 5]

In [57]: counts = [4, 3, 6, 1, 3]

In [58]: full_values = np.repeat(values, counts)

In [59]: full_values
Out[59]: array([1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5])

In [60]: norm.fit(full_values)  # Estimate mean and std. dev.
Out[60]: (2.7647058823529411, 1.3516617991854185)

scipy.stats.norm.fit 计算参数的最大似然估计.对于正态分布,这些只是样本均值和(有偏的)样本方差的平方根.据我所知,numpy 或 scipy 中唯一相关的加权统计函数是 numpy.average.您可以使用 numpy.average 自己进行计算,使用 counts 作为 weights 参数.

scipy.stats.norm.fit computes the maximum likelihood estimates of the parameters. For the normal distribution, these are just the sample mean and the square root of the (biased) sample variance. As far as I know, the only relevant weighted statistical function in numpy or scipy is numpy.average. You can do the computation yourself with numpy.average, using counts as the weights argument.

In [62]: sample_mean = np.average(values, weights=counts)

In [63]: sample_mean
Out[63]: 2.7647058823529411

In [64]: sample_var = np.average((values - sample_mean)**2, weights=counts)

In [65]: sample_var
Out[65]: 1.8269896193771626

In [66]: sample_std = np.sqrt(sample_var)

In [67]: sample_std
Out[67]: 1.3516617991854185

请注意,statistics.stdev 基于 无偏 样本方差.如果这是您想要的,您可以通过将有偏差的样本方差乘以 sum(counts)/(sum(counts) - 1) 来调整缩放比例:

Note that statistics.stdev is based on the unbiased sample variance. If that's what you want, you can adjust the scaling by multiplying the biased sample variance by sum(counts)/(sum(counts) - 1):

In [79]: n = sum(counts)

In [80]: sample_var = n/(n-1)*np.average((values - sample_mean)**2, weights=counts)

In [81]: sample_var
Out[81]: 1.9411764705882353

In [82]: sample_std = np.sqrt(sample_var)

In [83]: sample_std
Out[83]: 1.3932610920384718

这篇关于将正态分布拟合到加权列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆