如何在Python中拟合双高斯分布? [英] How to fit a double Gaussian distribution in Python?

查看:64
本文介绍了如何在Python中拟合双高斯分布?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取数据的双高斯分布(

对于给定的数据,我想获得两个高斯曲线,用于图中的峰.我使用以下代码(

解决方案

您不能为此使用scikit-learn,因为您没有处理要估计其分布的一组样本.您当然可以将曲线转换为PDF,对其进行采样,然后尝试使用高斯混合模型对其进行拟合,但这对我来说似乎有点过头了.

这是使用简单的最小二乘曲线拟合的解决方案.要使其正常工作,我必须删除背景,即忽略所有 y<5 ,并且还为 leastsq 提供了一个很好的起始向量,可以通过数据图对其进行估算.

查找起始向量

通过最小二乘法找到的参数向量是向量

  params = [c1,mu1,sigma1,c2,mu2,sigma2] 

在这里, c1 c2 是两个高斯的比例因子,即它们的高度 mu1 mu2 是平均值,即峰的水平位置以及 sigma1 sigma2 的标准偏差,这些偏差确定了高斯宽度.为了找到起始向量,我只是查看了数据图,并估算了两个峰的高度(分别为= c1 c2 )及其水平位置(=分别是 mu1 mu1 ). sigma1 sigma2 只需设置为 1.0 .

代码

来自sklearn导入混合物的

 导入matplotlib.pyplot导入matplotlib.mlab将numpy导入为np从pylab导入*从scipy.optimize导入minimumsq数据= np.genfromtxt('gaussian_fit.dat',跳过行= 1)x =数据[:,0]y =数据[:,1]def double_gaussian(x,params):(c1,mu1,sigma1,c2,mu2,sigma2)=参数res = c1 * np.exp(-(x-mu1)** 2.0/(2.0 * sigma1 ** 2.0))\+ c2 * np.exp(-(x-mu2)** 2.0/(2.0 * sigma2 ** 2.0))返回资源def double_gaussian_fit(params):适合= double_gaussian(x,params)返回(适合-y_proc)#删除背景.y_proc = np.copy(y)y_proc [y_proc<5] = 0.0#最小二乘拟合.通过检查发现起始值.适合=最小平方(double_gaussian_fit,[13.0,-13.0,1.0,60.0,3.0,1.0])情节(x,y,c ='b')情节(x,double_gaussian(x,fit [0]),c ='r') 

I am trying to obtain a double Gaussian distribution for data (link) using Python. The raw data is of the form:

For the given data, I would like to obtain two Gaussian profiles for the peaks seen in figure. I tried it with the following code (source):

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np
from pylab import *
data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
x = data[:, 0]
y = data[:, 1]
clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit((y, x))
m1, m2 = clf.means_
w1, w2 = clf.weights_
c1, c2 = clf.covars_
fig = plt.figure(figsize = (5, 5))
plt.subplot(111)
plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3)
plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3)
fig.savefig('gaussian_fit.pdf')

But I am not able to get the desired output. So, how can a double Gaussian distribution be obtained in Python?

Update

I was able to fit a single Gaussian distribution with the following code:

import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import numpy as np

data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
x = data[:, 0]
y = data[:, 1]
n = len(x)
mean = sum(x*y)/n
sigma = sum(y*(x-mean)**2)/n


def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))


popt,pcov = curve_fit(gaus, x, y ,p0 = [1, mean, sigma])


fig = plt.figure(figsize = (5, 5))
plt.subplot(111)
plt.plot(x, y, label='Raw')
plt.plot(x, gaus(x, *popt), 'o', markersize = 4, label='Gaussian fit')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
fig.savefig('gaussian_fit.pdf')

解决方案

You can't use scikit-learn for this, because the you are not dealing with a set of samples whose distribution you want to estimate. You could of course transform your curve to a PDF, sample it and then try to fit it using a Gaussian mixture model, but that seems to be a bit of an overkill to me.

Here's a solution using simple least square curve fitting. To get it to work I had to remove the background, i.e. ignore all data points with y < 5, and also provide a good starting vector for leastsq, which can be estimated form a plot of the data.

Finding the Starting Vector

The parameter vector that that is found by the least squares method is the vector

params = [c1, mu1, sigma1, c2, mu2, sigma2]

Here, c1 and c2 are scaling factors for the two Gaussians, i.e. their height, mu1and mu2 are the means, i.e. the horizontal positions of the peaks and sigma1 and sigma2 the standard deviations that determine the width of the Gaussians. To find a starting vector I just looked at a plot of the data and estimated the height of the two peaks ( = c1, c2, respectively) and their horizontal position (= mu1, mu1, respectively). sigma1 and sigma2 were simply set to 1.0.

Code

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np
from pylab import *
from scipy.optimize import leastsq

data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
x = data[:, 0]
y = data[:, 1]

def double_gaussian( x, params ):
    (c1, mu1, sigma1, c2, mu2, sigma2) = params
    res =   c1 * np.exp( - (x - mu1)**2.0 / (2.0 * sigma1**2.0) ) \
          + c2 * np.exp( - (x - mu2)**2.0 / (2.0 * sigma2**2.0) )
    return res

def double_gaussian_fit( params ):
    fit = double_gaussian( x, params )
    return (fit - y_proc)

# Remove background.
y_proc = np.copy(y)
y_proc[y_proc < 5] = 0.0

# Least squares fit. Starting values found by inspection.
fit = leastsq( double_gaussian_fit, [13.0,-13.0,1.0,60.0,3.0,1.0] )
plot( x, y, c='b' )
plot( x, double_gaussian( x, fit[0] ), c='r' )

这篇关于如何在Python中拟合双高斯分布?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆