使用 pymc 用 MCMC 拟合两个正态分布(直方图)? [英] Fit two normal distributions (histograms) with MCMC using pymc?

查看:56
本文介绍了使用 pymc 用 MCMC 拟合两个正态分布(直方图)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试拟合使用 CCD 上的光谱仪检测到的线轮廓.为了便于考虑,我提供了一个演示,如果解决了,它与我实际想要解决的非常相似.

I am trying to fit line profiles as detected with a spectrograph on a CCD. For ease of consideration, I have included a demonstration that, if solved, is very similar to the one I actually want to solve.

我看过这个:https://stats.stackexchange.com/questions/46626/pymc 中的两个正态分布拟合模型以及其他各种问题和答案,但他们所做的事情与我想做的事情根本不同.

I've looked at this: https://stats.stackexchange.com/questions/46626/fitting-model-for-two-normal-distributions-in-pymc and various other questions and answers, but they are doing something fundamentally different than what I want to do.

import pymc as mc
import numpy as np
import pylab as pl
def GaussFunc(x, amplitude, centroid, sigma):
    return amplitude * np.exp(-0.5 * ((x - centroid) / sigma)**2)

wavelength = np.arange(5000, 5050, 0.02)

# Profile 1
centroid_one = 5025.0
sigma_one = 2.2
height_one = 0.8
profile1 = GaussFunc(wavelength, height_one, centroid_one, sigma_one, )

# Profile 2
centroid_two = 5027.0
sigma_two = 1.2
height_two = 0.5
profile2 = GaussFunc(wavelength, height_two, centroid_two, sigma_two, )

# Measured values
noise = np.random.normal(0.0, 0.02, len(wavelength))
combined = profile1 + profile2 + noise

# If you want to plot what this looks like
pl.plot(wavelength, combined, label="Measured")
pl.plot(wavelength, profile1, color='red', linestyle='dashed', label="1")
pl.plot(wavelength, profile2, color='green', linestyle='dashed', label="2")
pl.title("Feature One and Two")
pl.legend()

我的问题: PyMC(或某些变体)能否给我上面使用的两个分量的均值、幅度和西格玛?

My question: Can PyMC (or some variant) give me the mean, amplitude, and sigma for the two components used above?

请注意,我真正适合我的实际问题的函数不是高斯函数——所以请提供使用通用函数的示例(如我的示例中的 GaussFunc),而不是内置"pymc.Normal() 类型函数.

Please note that the functions that I will actually fit on my real problem are NOT Gaussians -- so please provide the example using a generic function (like GaussFunc in my example), and not a "built-in" pymc.Normal() type function.

此外,我知道模型选择是另一个问题:因此,对于当前的噪声,1 个组件(配置文件)可能是统计上合理的全部.但我想看看对于 1、2、3 等组件的最佳解决方案是什么.

Also, I understand model selection is another issue: so with the current noise, 1 component (profile) might be all that is statistically justified. But I'd like to see what the best solution for 1, 2, 3, etc. components would be.

我也不接受使用 PyMC 的想法——如果 scikit-learn、astroML 或其他一些软件包看起来很完美,请告诉我!

I'm also not wed to the idea of using PyMC -- if scikit-learn, astroML, or some other package seems perfect, please let me know!

我在很多方面都失败了,但我认为正确的一件事是:

I failed a number of ways, but one of the things that I think was on the right track was the following:

sigma_mc_one = mc.Uniform('sig', 0.01, 6.5)
height_mc_one = mc.Uniform('height', 0.1, 2.5)
centroid_mc_one = mc.Uniform('cen', 5015., 5040.)

但我无法构建一个有效的 mc.model.

But I could not construct a mc.model that worked.

推荐答案

不是最简洁的 PyMC 代码,但我做出这个决定是为了帮助读者.这应该会运行,并给出(非常)准确的结果.

Not the most concise PyMC code, but I made that decision to help the reader. This should run, and give (really) accurate results.

我决定使用具有自由范围的统一先验,因为我真的不知道我们在建模什么.但可能有人对质心位置有一个想法,并且可以在那里使用更好的先验.

I made the decision to use Uniform priors, with liberal ranges, because I really have no idea what we are modelling. But probably one has an idea about the centroid locations, and can use a better priors there.

### Suggested one runs the above code first.
### Unknowns we are interested in


est_centroid_one = mc.Uniform("est_centroid_one", 5000, 5050 )
est_centroid_two = mc.Uniform("est_centroid_two", 5000, 5050 )

est_sigma_one = mc.Uniform( "est_sigma_one", 0, 5 )
est_sigma_two = mc.Uniform( "est_sigma_two", 0, 5 )

est_height_one = mc.Uniform( "est_height_one", 0, 5 ) 
est_height_two = mc.Uniform( "est_height_two", 0, 5 ) 

#std deviation of the noise, converted to precision by tau = 1/sigma**2
precision= 1./mc.Uniform("std", 0, 1)**2

#Set up the model's relationships.

@mc.deterministic( trace = False) 
def est_profile_1(x = wavelength, centroid = est_centroid_one, sigma = est_sigma_one, height= est_height_one):
    return GaussFunc( x, height, centroid, sigma )


@mc.deterministic( trace = False) 
def est_profile_2(x = wavelength, centroid = est_centroid_two, sigma = est_sigma_two, height= est_height_two):
    return GaussFunc( x, height, centroid, sigma )


@mc.deterministic( trace = False )
def mean( profile_1 = est_profile_1, profile_2 = est_profile_2 ):
    return profile_1 + profile_2


observations = mc.Normal("obs", mean, precision, value = combined, observed = True)


model = mc.Model([est_centroid_one, 
              est_centroid_two, 
                est_height_one,
                est_height_two,
                est_sigma_one,
                est_sigma_two,
                precision])

#always a good idea to MAP it prior to MCMC, so as to start with good initial values
map_ = mc.MAP( model )
map_.fit()

mcmc = mc.MCMC( model )
mcmc.sample( 50000,40000 ) #try running for longer if not happy with convergence.

重要

请记住,该算法与标签无关,因此结果可能会显示 profile1 具有 profile2 的所有特征,反之亦然.

Important

Keep in mind the algorithm is agnostic to labeling, so the results might show profile1 with all the characteristics from profile2 and vice versa.

这篇关于使用 pymc 用 MCMC 拟合两个正态分布(直方图)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆