使用pyMCMC/pyMC将非线性函数拟合到数据/观测 [英] Fit a non-linear function to data/observations with pyMCMC/pyMC

查看:227
本文介绍了使用pyMCMC/pyMC将非线性函数拟合到数据/观测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用高斯(和更复杂)的函数拟合一些数据.我在下面创建了一个小例子.

I am trying to fit some data with a Gaussian (and more complex) function(s). I have created a small example below.

我的第一个问题是,我做对了吗?

我的第二个问题是,如何在x方向(即观测值/数据的x位置)添加错误?

很难找到有关如何在pyMC中进行这种回归的很好的指南.也许是因为它更易于使用最小二乘法或类似的方法,但是我最后有很多参数,需要查看我们如何约束它们并比较不同的模型,pyMC似乎是一个不错的选择.

It is very hard to find nice guides on how to do this kind of regression in pyMC. Perhaps because its easier to use some least squares, or similar approach, I however have many parameters in the end and need to see how well we can constrain them and compare different models, pyMC seemed like the good choice for that.

import pymc
import numpy as np
import matplotlib.pyplot as plt; plt.ion()

x = np.arange(5,400,10)*1e3

# Parameters for gaussian
amp_true = 0.2
size_true = 1.8
ps_true = 0.1

# Gaussian function
gauss = lambda x,amp,size,ps: amp*np.exp(-1*(np.pi**2/(3600.*180.)*size*x)**2/(4.*np.log(2.)))+ps
f_true = gauss(x=x,amp=amp_true, size=size_true, ps=ps_true )

# add noise to the data points
noise = np.random.normal(size=len(x)) * .02 
f = f_true + noise 
f_error = np.ones_like(f_true)*0.05*f.max()

# define the model/function to be fitted.
def model(x, f): 
    amp = pymc.Uniform('amp', 0.05, 0.4, value= 0.15)
    size = pymc.Uniform('size', 0.5, 2.5, value= 1.0)
    ps = pymc.Normal('ps', 0.13, 40, value=0.15)

    @pymc.deterministic(plot=False)
    def gauss(x=x, amp=amp, size=size, ps=ps):
        e = -1*(np.pi**2*size*x/(3600.*180.))**2/(4.*np.log(2.))
        return amp*np.exp(e)+ps
    y = pymc.Normal('y', mu=gauss, tau=1.0/f_error**2, value=f, observed=True)
    return locals()

MDL = pymc.MCMC(model(x,f))
MDL.sample(1e4)

# extract and plot results
y_min = MDL.stats()['gauss']['quantiles'][2.5]
y_max = MDL.stats()['gauss']['quantiles'][97.5]
y_fit = MDL.stats()['gauss']['mean']
plt.plot(x,f_true,'b', marker='None', ls='-', lw=1, label='True')
plt.errorbar(x,f,yerr=f_error, color='r', marker='.', ls='None', label='Observed')
plt.plot(x,y_fit,'k', marker='+', ls='None', ms=5, mew=2, label='Fit')
plt.fill_between(x, y_min, y_max, color='0.5', alpha=0.5)
plt.legend()

我意识到我可能必须运行更多的迭代,最后使用burn in和thinning.绘制数据和拟合的图如下所示.

I realize that I might have to run more iterations, use burn in and thinning in the end. The figure plotting the data and the fit is seen here below.

pymc.Matplot.plot(MDL)图形看起来像这样,显示了很好的峰值分布.很好,对吧?

The pymc.Matplot.plot(MDL) figures looks like this, showing nicely peaked distributions. This is good, right?

推荐答案

我的第一个问题是,我做对了吗?

My first question is, am I doing it right?

是的!您需要包括一个预知期.我喜欢扔掉样品的前半部分.您无需进行任何细化,但有时它会使您的MCMC后处理工作更快,存储更小.

Yes! You need to include a burn-in period, which you know. I like to throw out the first half of my samples. You don't need to do any thinning, but sometimes it will make your post-MCMC work faster to process and smaller to store.

我建议的另一件事是设置一个随机种子,以便您的结果可重现":np.random.seed(12345)可以解决问题.

The only other thing I advise is to set a random seed, so that your results are "reproducible": np.random.seed(12345) will do the trick.

哦,如果我真的给出了太多建议,我会说import seaborn来使matplotlib结果更加美观.

Oh, and if I was really giving too much advice, I'd say import seaborn to make the matplotlib results a little more beautiful.

我的第二个问题是,如何在x方向(即观测值/数据的x位置)添加错误?

My second question is, how do I add an error in the x-direction, i.e. in the x-position of the observations/data?

一种方法是为每个错误包括一个潜在变量.这在您的示例中有效,但是如果您有更多观察结果,则将不可行.我会举一个例子,让您开始这条路:

One way is to include a latent variable for each error. This works in your example, but will not be feasible if you have many more observations. I'll give a little example to get you started down this road:

# add noise to observed x values
x_obs = pm.rnormal(mu=x, tau=(1e4)**-2)

# define the model/function to be fitted.
def model(x_obs, f): 
    amp = pm.Uniform('amp', 0.05, 0.4, value= 0.15)
    size = pm.Uniform('size', 0.5, 2.5, value= 1.0)
    ps = pm.Normal('ps', 0.13, 40, value=0.15)

    x_pred = pm.Normal('x', mu=x_obs, tau=(1e4)**-2) # this allows error in x_obs

    @pm.deterministic(plot=False)
    def gauss(x=x_pred, amp=amp, size=size, ps=ps):
        e = -1*(np.pi**2*size*x/(3600.*180.))**2/(4.*np.log(2.))
        return amp*np.exp(e)+ps
    y = pm.Normal('y', mu=gauss, tau=1.0/f_error**2, value=f, observed=True)
    return locals()

MDL = pm.MCMC(model(x_obs, f))
MDL.use_step_method(pm.AdaptiveMetropolis, MDL.x_pred) # use AdaptiveMetropolis to "learn" how to step
MDL.sample(200000, 100000, 10)  # run chain longer since there are more dimensions

如果在xy中有噪音,似乎很难获得好的答案:

It looks like it may be hard to get good answers if you have noise in x and y:

这是一个笔记本,收集了所有这些信息.

这篇关于使用pyMCMC/pyMC将非线性函数拟合到数据/观测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆