使用NUTS和Metropolis的分层概率模型的收敛性问题 [英] Convergence issues on hierarchical probit model with NUTS and Metropolis

查看:157
本文介绍了使用NUTS和Metropolis的分层概率模型的收敛性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Gelman和Hill扩展层次模型

I am trying to extend the hierarchical model from Gelman and Hill reproduced in PyMC3 here to binary outcome data by adding a probit transformation and making the outcome follow a Bernoulli distribution. Right now I'm using toy data, so I know the true values. Alpha should be .1, beta should be .5.

在扩展之前,使用NUTS采样器可以很好地运行模型.添加后,估算值将缓慢增加并持续增加,直到模型停滞10到200次迭代之间的任何位置.这是一幅图像,直到它一直达到120(相对较长的时间).

The model runs fine with a NUTS sampler before the extension. Once I add it, the estimates slowly increase and keep increasing until the model stalls anywhere between 10 and 200 iterations. Here's an image from when it made it all the way to 120 (a relatively long run).

在扩展之前,Metropolis需要进行200,000次迭代才能很好地修复真实的参数值,但最终还是可以.扩展后,它停顿在30k到50k之间.与NUTS不同,当您尝试在停转后停止它时,它会完全崩溃,所以我没有照片.较早地停止它可以使贝塔系数的估计值大致超过零,但分布范围很广.

Before the extension, Metropolis needs a good 200,000 iterations to get a good fix on the true parameter values, but it does eventually. After the extension it stalls somewhere between 30k and 50k. Unlike with NUTS, it completely crashes when you try to stop it post-stall, so I don't have a picture. Stopping it earlier gives an estimate over beta roughly over zero, but with a wide spread.

代码粘贴在下面.

我不确定这是抽样问题还是规格问题.有没有更好的方法来指定Probit?其他取样器有什么建议吗?我已尽力将模型简化为测试,并在添加概率扩展后将其范围缩小为破坏,但接下来的工作让我感到茫然.

I'm not sure if it's a sampling problem, or a specification problem. Is there a better way to specify a Probit? Any tips on other samplers to try? I've stripped my model down as far as I can for the test, and narrowed it down to breaking once I add probit extension, but I'm at a loss for what to do next.

#Generate Data
n=100
#Determine how many observations per group
evts=np.random.randint(10,100,n)
#Determine which groups will be receive treatment
x_g=np.random.binomial(1,.3,n)
#pre-create distribution of betas for groups 
mu = np.random.normal(.5,.2,n)
#preallocate space in a dataframe
i = np.zeros(evts.sum())
y_obs = pd.DataFrame({'y':i.copy(),
                      'x':i.copy(),
                      'grp':i.copy()},
                     index = range(evts.sum()))
#populate dataframe with simulated data
i=0
for grp in range(100):
    #index of observations for a given group
    ind = list(range(i,(i+evts[grp])))
    i += evts[grp]
    #generate outcomes using
    #different dgp depending on treatment
    if x_g[grp] ==1:
        #shortcut to make sure 1>p>0
        p_i = max((.1 + mu[grp]),0.01)
        p_i = min(p_i,1)
        out = np.random.binomial(1,p_i,evts[grp])
    else:
        out = np.random.binomial(1,.1,evts[grp])
    #Assign to dataframe
    y_obs.loc[ind,'y'] = out
    y_obs.loc[ind,'x'] = x_g[grp]
    y_obs.loc[ind,'grp'] = grp
y_obs = y_obs.astype(int)
print('starting model')
with pm.Model() as test_model:
    #hyperpriors
    mu_a=pm.Normal('mu_a',mu=0, sd=100**2)
    sig_a = pm.Uniform('sig_a',lower=0,upper=100)
    mu_b=pm.Normal('mu_b',mu=0, sd=100**2)
    sig_b = pm.Uniform('sig_b',lower=0,upper=100)
    #priors
    a = pm.Normal('a',mu=mu_a,sd = sig_a, shape=n)
    b = pm.Normal('b',mu=mu_b,sd = sig_b, shape=n)

    eps = pm.Uniform('eps',lower=0,upper=100)

    est = a[y_obs.grp] + b[y_obs.grp] * y_obs.x
    #I get correct estimates when I 
    #stop here using commented out line. 
#     y_hat = pm.Normal('y_hat',
#                       mu=est,
#                       sd=eps, 
#                       observed = y_obs.y)

    #Probit transformation:
    y_hat = pm.Normal('y_hat',
                      mu=est,
                      sd=eps, 
                      shape=y_obs.shape[0])

    mu_y = tt.mean(y_hat)
    eps_hat = tt.var(y_hat)
    p_hat = 0.5 * (1 + tt.erf((y_hat-mu_y) / (eps_hat*tt.sqrt(2))))

    y = pm.Bernoulli('y',p=p_hat, observed = y_obs.y)


with test_model:
    #Either:
    mu,sds,elbo = pm.variational.advi(n=100000)
    step = pm.NUTS(scaling=test_model.dict_to_array(sds),
                   is_cov=True)
    test_trace = pm.sample(200, step, start=mu)
    #or
#     step=pm.Metropolis()
#     test_trace = pm.sample(50000)

pm.traceplot(test_trace)#[-5000::3])

注意:已编辑以修正以下行中的拼写错误:'step = pm.NUTS(scaling = test_model.dict_to_array(sds),

NOTE: edited to fix typo in line: 'step = pm.NUTS(scaling=test_model.dict_to_array(sds),`

对于最初发布的模型,我对probit扩展做了更好的仿真数据. (原始数据生成是Now ADVI提供了更好的估计,因此它在正确的位置开始,但是NUTS仍然非常快速地停顿(大约十次迭代).Metropolis直接失败:我进行了第一轮5000次迭代,并且得到了尝试绘制轨迹时发生错误.

I made better simulation data for the probit extension for the model originally posted. (the original data generation was Now ADVI gives better estimates, so it's starting in around the right place, but NUTS still stalls very quickly (at around ten iterations). Metropolis straight up fails: I did a first round of 5000 iterations, and got an error when trying to plot the trace.

新数据生成:

n=100
evts=np.random.randint(10,100,n)
x_g=np.random.binomial(1,.3,n)
i = np.zeros(evts.sum())
mu = np.random.normal(.5,.2,n)
mu0 = np.random.normal(.1,.05,n)
y_obs = pd.DataFrame({'y':i.copy(),'x':i.copy(),'grp':i.copy()},index = range(evts.sum()))
i=0
for grp in range(100):
    ind = list(range(i,(i+evts[grp])))
    i += evts[grp]
    if x_g[grp] ==1:
        est = mu0[grp] + mu[grp]
    else:
        est=mu0[grp]
    p_hat = tt.nnet.sigmoid(est).eval()
    y_obs.loc[ind,'y_hat'] = est
    y_obs.loc[ind,'y'] = np.random.binomial(1,p_hat,len(ind))
    y_obs.loc[ind,'x'] = x_g[grp]
    y_obs.loc[ind,'grp'] = grp
y_obs['grp']=y_obs.grp.astype(np.int64)

当pymc3尝试绘制密度时,都会发生错误:

Error for metropolis when pymc3 tries to plot a density:

ValueError:v不能为空

ValueError: v cannot be empty

推荐答案

也许我误解了您正在尝试做的事情,但是该模型不应该起作用:

Maybe I misunderstand what you are trying to do, but shouldn't this model work:

with pm.Model() as test_model:
    #hyperpriors
    mu_a = pm.Flat('mu_a')
    sig_a = pm.HalfCauchy('sig_a', beta=2.5)

    mu_b = pm.Flat('mu_b')
    sig_b = pm.HalfCauchy('sig_b', beta=2.5)

    #priors
    a_raw = pm.Normal('a_raw', mu=0, sd=1, shape=n)
    a = pm.Deterministic('a', mu_a + sig_a * a_raw)

    b_raw = pm.Normal('b_raw', mu=0, sd=1, shape=n)
    b = pm.Deterministic('b', mu_b + sig_b * b_raw)

    est = a[y_obs.grp.values] + b[y_obs.grp.values] * y_obs.x

    y = pm.Bernoulli('y', p=tt.nnet.sigmoid(est), observed = y_obs.y)

这是一个logit,而不是一个probit模型.如果出于某种原因需要概率,则用标准的概率函数替换tt.nnet.sigmoid.

This is a logit, not a probit model. If you want a probit for some reason, you an replace tt.nnet.sigmoid by a standard probit function.

这对于您的数据集来说仍然有些困难,但是我认为这是由于数据生成错误:您假设所有0.1组的常数a,但是在模型中允许a值不同按组.采样器的sig_a值非常小(毕竟,真正的值是0 ...).

This still has a somewhat hard time with your dataset, but I think that is because of a mistake in the data generation: you assume a constant a for all groups of 0.1, but in the model you allow the a values to differ by group. The sampler has trouble with very small values for sig_a (the true value is 0 after all...).

编辑:更多说明:使用标准普通a_rawb_raw然后使用pm.Deterministic将它们转换为Normal(mu=mu_a, sd=sig_a)的更改不会更改后验,但是使采样器更容易.这称为非中心参数化.有关该主题的更详细说明,请参见例如 http://mc-stan .org/documentation/case-studies/divergences_and_bias.html ,这也应该可以帮助您了解为什么很小的差异会带来问题.

Edit: Some more explanations: The change to use standard normal a_raw and b_raw and then transform them to a Normal(mu=mu_a, sd=sig_a) using pm.Deterministic does not change the posterior, but it makes it easier for the sampler. It is called a non-centered parametrization. For a more in depth description of the topic see eg http://mc-stan.org/documentation/case-studies/divergences_and_bias.html, this should also help you to understand, why very small variances can be problematic.

编辑:生成新数据

n = 100
evts=np.random.randint(10,100,n)
x_g=np.random.binomial(1,.3,n)
i = np.zeros(evts.sum())
mu = np.random.normal(.5,.2,n)
mu0 = np.random.normal(.1,.05,n)
y_obs = pd.DataFrame({'y':i.copy(),'x':i.copy(),'grp':i.copy()},
                     index = range(evts.sum()))
i = 0
for grp in range(100):
    ind = list(range(i,(i+evts[grp])))
    i += evts[grp]
    if x_g[grp] ==1:
        est = mu0[grp] + mu[grp]
    else:
        est=mu0[grp]
    p_hat = tt.nnet.sigmoid(est).eval()
    y_obs.loc[ind,'y_hat'] = est
    y_obs.loc[ind,'y'] = np.random.binomial(1,p_hat,len(ind))
    y_obs.loc[ind,'x'] = x_g[grp]
    y_obs.loc[ind,'grp'] = grp
y_obs['grp']=y_obs.grp.astype(np.int64)

使用

with test_model:
    trace = pm.sample(2000, tune=1000, njobs=4)

大约三分钟后完成

Auto-assigning NUTS sampler...
Initializing NUTS using advi...
  8%|▊         | 15977/200000 [00:15<02:51, 1070.66it/s]Median ELBO converged.
Finished [100%]: Average ELBO = -4,458.8

100%|██████████| 2000/2000 [02:48<00:00,  9.99it/s]

没有分歧的转变:

test_trace[1000:].diverging.sum()

全部使用pymc3和theano master. (两者都将准备好发布新版本)

All using pymc3 and theano master. (Both are about to ready for a new release)

这篇关于使用NUTS和Metropolis的分层概率模型的收敛性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆