PyMC中的负二项式混合 [英] Negative Binomial Mixture in PyMC

查看:88
本文介绍了PyMC中的负二项式混合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将负二项式混合物与PyMC配合使用. 看来我做错了,因为预测看起来与输入数据完全不同. 问题可能出在负二项式参数之前. 有什么建议吗?

I am trying to fit a Negative binomial mixture with PyMC. It seems I do something wrong, because the predictive doesn't look at all similar to the input data. The problem is probably in the prior of the Negative binomial parameters. Any suggestions?

    from sklearn.cluster import KMeans
    import pymc as mc
    n = 3 #Number of components of the mixture
    ndata = len(data)

    dd = mc.Dirichlet('dd', theta=(1,)*n)
    category = mc.Categorical('category', p=dd, size=ndata)

    kme = KMeans(n) # This is not needed but it is to help convergence
    kme.fit(data[:,newaxis])
    alphas = mc.TruncatedNormal('alphas', kme.cluster_centers_[:,0], 0.1, a=0. ,b=100000 ,size=n)
    means = mc.TruncatedNormal('means', kme.cluster_centers_[:,0],0.1,a=0.0 ,b=100000, size=n)

    @mc.deterministic
    def mean(category=category, means=means):
        return means[category]

    @mc.deterministic
    def alpha(category=category, alphas=alphas):
        return alphas[category]

    obs = mc.NegativeBinomial('obs', mean, alpha, value=data, observed = True)

    predictive = mc.NegativeBinomial('predictive', mean, alpha)

    model = mc.Model({'dd': dd,
                  'category': category,
                  'alphas': alphas,
                  'means': means,
                  'predictive':predictive,
                  'obs': obs})

    mcmc = mc.MCMC( model )
    mcmc.sample( iter=n_samples, burn=int(n_samples*0.7))

推荐答案

您已经正确实现了三个分布的混合的贝叶斯估计,但是MCMC模型给出了错误的值.

You have correctly implemented a Bayesian estimation of a mixture of three distributions, but the MCMC model gives wrong-looking values.

问题在于category收敛速度不够快,并且meansalphasdd中的参数在category决定哪些点属于哪个分布之前偏离了良好的值.

The problem is that category is not converging quickly enough, and the parameters in means, alphas, and dd run away from the good values before category decides which points belong to which distribution.

data = np.atleast_2d(list(mc.rnegative_binomial(100., 10., size=s)) +
    list(mc.rnegative_binomial(200., 1000., size=s)) +
    list(mc.rnegative_binomial(300., 1000., size=s))).T
nsamples = 10000

通过可视化可以看到category的后验是错误的:

You can see that the posterior for category is wrong by visualizing it:

G = [data[np.nonzero(np.round(mcmc.trace("category")[:].mean(axis=0)) == i)]
    for i in range(0,3) ]
plt.hist(G, bins=30, stacked = True)

期望最大化是稳定潜在变量的经典方法,但是您也可以使用快速和肮脏的k均值拟合的结果来提供MCMC的初始值:

Expectation-maximization is the classic approach to stabilize the latent variables, but you can also use the results of the quick-and-dirty k-means fit to provide initial values for the MCMC:

category = mc.Categorical('category', p=dd, size=ndata, value=kme.labels_)

然后,这些估计值收敛到看起来合理的值.

Then the estimates converge to reasonable-looking values.

对于先前的Alpha版本,您可以对所有这些对象使用相同的分布:

For your prior on alpha, you can just use the same distribution for all of them:

alphas = mc.Gamma('alphas', alpha=1, beta=.0001 ,size=n)

这个问题不是负二项式分布所特有的;正态分布的Dirichlet混合物以相同的方式失效.归因于具有高维分类分布,MCMC的优化效率不高.

This problem is not specific to the negative binomial distribution; Dirichlet-mixtures of normal distributions fail in the same way; it results from having a high-dimensional categorical distribution that MCMC is not efficient at optimizing.

这篇关于PyMC中的负二项式混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆