如何使用scipy优化来找到3个参数和数据点列表的最小卡方? [英] How can I use scipy optimization to find the minimum chi-squared for 3 parameters and a list of data points?

查看:184
本文介绍了如何使用scipy优化来找到3个参数和数据点列表的最小卡方?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个已排序的随机数和高斯覆盖的直方图.直方图表示每个bin的观测值(将此基本情况应用于更大的数据集),而高斯曲线则试图拟合数据.显然,该高斯不能代表与直方图的最佳拟合.下面的代码是高斯公式.

I have a histogram of sorted random numbers and a Gaussian overlay. The histogram represents observed values per bin (applying this base case to a much larger dataset) and the Gaussian is an attempt to fit the data. Clearly, this Gaussian does not represent the best fit to the histogram. The code below is the formula for a Gaussian.

normc, mu, sigma = 30.845, 50.5, 7 # normalization constant, avg, stdev
gauss = lambda x: normc * exp( (-1) * (x - mu)**2 / ( 2 * (sigma **2) ) )

我计算了每个仓(曲线下的面积)的期望值,并计算了每个仓的观测值的数量.有几种找到最佳"拟合的方法.我担心最小化Chi-Squared是否最适合. 在针对Chi-Squared的此公式中,期望值是曲线下面积bin,观察值是每个bin中已排序数据值的出现次数.因此,我想使normc,mu和sigma接近其给定值,以找到可产生最小卡方的normc,mu和sigma的正确组合,因为这些是我可以插入上面代码中进行覆盖的参数直方图上最合适的高斯.我正在尝试使用scipy模块来最小化我的卡方值在此示例中完成的操作.由于我需要波动参数,因此我将使用函数gauss(上面定义)来绘制高斯叠加图,并定义一个新函数以找到最小的Chi-Squared.

I calculated the expectation values per bin (area under the curve) and calculated the number of observed values per bin. There are several methods to find the 'best' fit. I am concerned with the best fit possible by minimizing Chi-Squared. In this formula for Chi-Squared, the expectation value is the area under the curve per bin and the observed value is the number of occurrences of sorted data values per bin. So I want to fluctuate normc, mu, and sigma near their given values to find the right combination of normc, mu, and sigma that produce the smallest Chi-Square, as these will be the parameters I can plug into the code above to overlay the best fit Gaussian on my histogram. I am trying to use the scipy module to minimize my Chi-Square as done in this example. Since I need to fluctuate parameters, I will use the function gauss (defined above) to plot the Gaussian overlay, and will define a new function to find the minimum Chi-Squared.

def gaussmin(var,data):
    # var[0] = normc
    # var[1] = mu
    # var[2] = sigma
    # data is the sorted random numbers, represents unbinned observed values
    for index in range(len(data)):
        return var[0] * exp( (-1) * (data[index] - var[1])**2 / ( 2 * (var[2] **2) ) ) 
    # I realize this will return a new value for each index of data, any guidelines to fix?

在此之后,我被困住了.如何波动参数以找到最合适的normc,mu,sigma?我最后一个解决方案的尝试如下:

After this, I am stuck. How can I fluctuate the parameters to find the normc, mu, sigma that produced the best fit? My last attempt at a solution is below:

var = [normc, mu, sigma]
result = opt.minimize(chi2, [normc,mu,sigma])
# chi2 is the chisquare value obtained via scipy
# chisquare input (a,b) 
# where a is number of occurences per bin, b is expected value per bin
# b is dependent upon normc, mu, sigma
print(result)
# data is a list, can I keep it as a constant and only fluctuate parameters in var?

在线有很多关于标量函数的示例,但是我找不到关于变量函数的示例.

There are plenty of examples online for scalar functions but I cannot find any for variable functions.

PS-到目前为止,我可以发布完整的代码,但是有点长.如果您希望看到它,只需询问即可,我可以在此处发布它或提供googledrive链接.

PS - I can post my full code so far but it's bit lengthy. If you would like to see it, just ask and I can post it here or provide a googledrive link.

推荐答案

高斯分布的特征在于均值和方差(或标准偏差).在您的数据呈正态分布的假设下,通过使用x-bar作为均值,将s-squared作为方差来获得最佳拟合.但在这样做之前,我将使用例如 qq图.

A Gaussian distribution is completely characterized by its mean and variance (or std deviation). Under the hypothesis that your data are normally distributed, the best fit will be obtained by using x-bar as the mean and s-squared as the variance. But before doing so, I'd check whether normality is plausible using, e.g., a q-q plot.

这篇关于如何使用scipy优化来找到3个参数和数据点列表的最小卡方?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆