平方误差和(SSE)的Python分布拟合 [英] Python Distribution Fitting with Sum of Square Error (SSE)

查看:91
本文介绍了平方误差和(SSE)的Python分布拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到适合我的数据的最佳分布曲线,该数据由

I am trying to find an optimal distribution curve fit to my data consisting of

y-axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 
          0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 
          0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

y轴是在x轴时间仓中发生的事件的概率:

y-axis are probabilities of an event occurring in x-axis time bins:

x-axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 
          12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 
          22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 
          32.0, 33.0, 34.0]

我正在按照中提供的示例在python中进行此操作通过Scipy(Python)向理论上的经验分布?

具体来说,我正在尝试重新创建名为平方误差和的分布拟合(SSE)"的部分,您将在其中遍历不同的分布以找到适合数据的部分.

Specifically I am attempting to recreate the part called 'Distribution Fitting with Sum of Square Error (SSE)', where you run through the different distributions to find the right fit to the data.

如何修改该示例以使其对我的数据输入有效?回答

How can I modify that example in order to make this work on my data inputs? answered

根据Bill的响应更新版本,但现在尝试根据数据绘制拟合曲线并发现有问题:

Update version based on Bill's response, but now trying to plot the fitted curve against the data and seeing something off:

%matplotlib inline
import matplotlib.pyplot as plt
import scipy
import scipy.stats
import numpy as np
from scipy.stats import gamma, lognorm, loglaplace
from scipy.optimize import curve_fit

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')

def f(x, a, loc, scale):
    return gamma.pdf(x, a, loc, scale)

result, pcov = curve_fit(f, x_axis, y_axis)

# get curve shape, location, scale
shape = result[:-2]
loc = result[-2]
scale = result[-1]

# construct the curve
x = np.linspace(0, 36, 100)
y = f(x, *result)

plt.bar(x_axis, y_axis, width, alpha=0.75)
plt.plot(x, y, c='g')

推荐答案

您的情况与您引用的问题中的情况不同.您既有数据点的纵坐标又有横坐标,而不是通常的i.i.d.样本.我建议您使用 scipy curve_fit .这是一个示例.

Your situation is not the same as that in the one treated in the question you cited. You have both the ordinates and the abscissae of the data points, rather than the usual i.i.d. sample. I would suggest that you use scipy curve_fit. Here's a sample.

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

## y_axis values must be normalised
sum_ys = sum(y_axis)
y_axis = [_/sum_ys for _ in y_axis]
print (sum(y_axis))

from scipy.stats import gamma, norm
from scipy.optimize import curve_fit

def gamma_f(x, a, loc, scale):
    return gamma.pdf(x, a, loc, scale)

def norm_f(x, loc, scale):
    return norm.pdf(x, loc, scale)

fitting = norm_f

result = curve_fit(fitting, x_axis, y_axis)
print (result)

import matplotlib.pyplot as plt

plt.plot(x_axis, y_axis, 'ro')
plt.plot(x_axis, [fitting(_, *result[0]) for _ in x_axis], 'b-')
plt.axis([0,35,0,.5])
plt.show()

此版本显示了如何对数据进行正常拟合的一个图.(伽马拟合度很差.)法线只需要两个参数.通常,您只需要输出结果的第一部分,即参数,形状,位置和比例的估计值.

This version shows how to do one plot, for the normal fit to the data. (The gamma provides a poor fit.) Only two parameters are needed for the normal. In general you would need only the first part of the output results, the estimates of the parameters, shape, location and scale.

(array([  2.3352639 ,  -3.08105104,  10.15024823]), array([[   5954.86532869,  -27818.92220973,  -19675.22421994],
       [ -27818.92220973,  133161.76500251,   90741.43608615],
       [ -19675.22421994,   90741.43608615,   66054.79087992]]))

请注意,伽玛分布的pdf也可以在scipy中获得,我认为您需要的其他pdf文件也省去了.

Notice that the pdf of the gamma distribution is also available in scipy, as are the others that you need, I think, saving you the work of coding them.

我在第一个代码中省略的最重要的事情是需要对y值进行归一化,也就是说,使它们的总和等于一个,因为它们应该近似于直方图的高度.

The most important thing I omitted from the first code was the need to normalise the y-values, that is, to make them sum to one, since they should approximate histogram heights.

这篇关于平方误差和(SSE)的Python分布拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆