将分布拟合到直方图 [英] Fit a distribution to a histogram

查看:67
本文介绍了将分布拟合到直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道我的数据点的分布,所以首先我绘制了我的数据的直方图.我的直方图如下所示:

I want to know the distribution of my data points, so first I plotted the histogram of my data. My histogram looks like the following:

其次,为了使它们适合某个发行版,这是我编写的代码:

Second, in order to fit them to a distribution, here's the code I wrote:

size = 20000
x = scipy.arange(size)
# fit
param = scipy.stats.gamma.fit(y)
pdf_fitted = scipy.stats.gamma.pdf(x, *param[:-2], loc = param[-2], scale = param[-1]) * size
plt.plot(pdf_fitted, color = 'r')

# plot the histogram
plt.hist(y)

plt.xlim(0, 0.3)
plt.show()

结果是:

我做错了什么?

推荐答案

你的数据似乎不是伽马分布的,但假设是,你可以这样拟合:

Your data does not appear to be gamma-distributed, but assuming it is, you could fit it like this:

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

gamma = stats.gamma
a, loc, scale = 3, 0, 2
size = 20000
y = gamma.rvs(a, loc, scale, size=size)

x = np.linspace(0, y.max(), 100)
# fit
param = gamma.fit(y, floc=0)
pdf_fitted = gamma.pdf(x, *param)
plt.plot(x, pdf_fitted, color='r')

# plot the histogram
plt.hist(y, normed=True, bins=30)

plt.show()

  • pdf 下的面积(在整个域上)等于 1.如果使用 normed=True,直方图下方的面积等于 1.

  • The area under the pdf (over the entire domain) equals 1. The area under the histogram equals 1 if you use normed=True.

x 的长度为 size(即 20000),pdf_fittedx 的形状相同>.如果我们调用 plot 并仅指定 y 值,例如plt.plot(pdf_fitted),然后在 x 范围 [0, size] 上绘制值.x范围太大了.由于直方图将使用 [min(y), max(y)] 的 x 范围,我们通常选择 x 来跨越类似的范围:x = np.linspace(0, y.max()),并使用指定的 x 和 y 值调用 plot,例如plt.plot(x, pdf_fitted).

x has length size (i.e. 20000), and pdf_fitted has the same shape as x. If we call plot and specify only the y-values, e.g. plt.plot(pdf_fitted), then values are plotted over the x-range [0, size]. That is much too large an x-range. Since the histogram is going to use an x-range of [min(y), max(y)], we much choose x to span a similar range: x = np.linspace(0, y.max()), and call plot with both the x- and y-values specified, e.g. plt.plot(x, pdf_fitted).

正如 Warren Weckesser 在评论中指出的那样,对于大多数应用程序,您知道伽马分布的域从 0 开始.如果是这种情况,请使用 floc=0 来保存 loc 参数为 0.如果没有 floc=0gamma.fit 将尝试为 loc 找到最佳拟合值> 参数也是如此,鉴于数据的变化无常,通常不会完全为零.

As Warren Weckesser points out in the comments, for most applications you know the gamma distribution's domain begins at 0. If that is the case, use floc=0 to hold the loc parameter to 0. Without floc=0, gamma.fit will try to find the best-fit value for the loc parameter too, which given the vagaries of data will generally not be exactly zero.

这篇关于将分布拟合到直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆