使用pymc3计算具有多个似然函数的模型的WAIC [英] Calculating WAIC for models with multiple likelihood functions with pymc3

查看:55
本文介绍了使用pymc3计算具有多个似然函数的模型的WAIC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试根据进球数预测足球比赛的结果,并使用以下模型:

以 pm.Model() 作为模型:# 全局模型参数h = pm.Normal('h', mu = mu, tau = tau)sd_a = pm.Gamma('sd_a', .1, .1)sd_d = pm.Gamma('sd_d', .1, .1)alpha = pm.Normal('alpha', mu=mu, tau = tau)# 团队特定的模型参数a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)atts = pm.Deterministic('atts', a_s - tt.mean(a_s))defs = pm.Deterministic('defs', d_s - tt.mean(d_s))h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])# 观察数据的可能性h_goals = pm.Poisson('h_goals',mu=h_theta,observed=observed_h_goals)a_goals = pm.Poisson('a_goals',mu=a_theta,observed=observed_a_goals)

当我对模型进行采样时,轨迹图看起来不错.

之后当我想计算 WAIC 时:

waic = pm.waic(trace, model)

我收到以下错误:

<预><代码>---->1 waic = pm.waic(跟踪,模型)~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py 包装(*args,**kwargs)22)23 kwargs[新] = kwargs.pop(旧)—>24 返回 func(*args, **kwargs)2526 回包装~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)第1176章1177 inference_data = convert_to_inference_data(数据)->1178 log_likelihood = _get_log_likelihood(推理数据)1179 scale = rcParams [stats.ic_scale"] 如果 scale 是 None else scale.lower()1180~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)403 var_names.remove(lp")第404话1:——>405 引发类型错误(406找到几个对数似然数组{},var_name不能为None".format(var_names)第407章类型错误:找到几个对数似然数组 [‘h_goals’, ‘a_goals’],var_name 不能为 None

当我在 pymc3 中有两个似然函数时,有什么方法可以计算 WAIC 并比较模型?(1:主队进球数 2:客队进球数)

解决方案

这是可能的,但需要定义您有兴趣预测的内容,它可以是比赛的结果,也可以是任一球队的进球数(不是聚合,每个匹配都会提供 2 个结果来预测).

完整详细的答案可在 PyMC 讨论.

这里我将感兴趣的数量是匹配结果的情况转录为摘要.ArviZ 将自动检索 2 个逐点对数似然数组,我们必须以某种方式组合它们(例如添加、连接、分组...)以获得单个数组.棘手的部分是知道哪个操作对应于每个数量,必须在每个模型的基础上进行评估.在此特定示例中,匹配结果的预测准确度可以通过以下方式计算:

dims = {home_points":[匹配"],"away_points": ["匹配"],}idata = az.from_pymc3(trace, dims=dims, model=model)

设置 match dim 很重要,它告诉 xarray 如何对齐逐点对数似然数组,否则它们将不会以所需的方式进行广播和对齐.>

idata.sample_stats["log_likelihood"] = (idata.log_likelihood.home_points + idata.log_likelihood.away_points)az.waic(idata)# 输出# 从 3000 x 60 对数似然矩阵计算## 估计 SE# elpd_waic -551.28 37.96# p_waic 46.16 -## 计算过程中出现了警告.请检查结果.

请注意,需要 ArviZ>=0.7.0.

I try to predict the outcome of soccer games based on the number of goals scored and I use the following model:

with pm.Model() as model:
  # global model parameters
   h = pm.Normal('h', mu = mu, tau = tau)
   sd_a = pm.Gamma('sd_a', .1, .1) 
   sd_d = pm.Gamma('sd_d', .1, .1) 
   alpha = pm.Normal('alpha', mu=mu, tau = tau)

  # team-specific model parameters
   a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)
   d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)

   atts = pm.Deterministic('atts', a_s - tt.mean(a_s))
   defs = pm.Deterministic('defs', d_s - tt.mean(d_s))
   h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])
   a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])

  # likelihood of observed data
   h_goals = pm.Poisson('h_goals', mu=h_theta, observed=observed_h_goals)
   a_goals = pm.Poisson('a_goals', mu=a_theta, observed=observed_a_goals)

When I sample the model, the trace plots look fine.

Afterward when I want to calculate the WAIC:

waic = pm.waic(trace, model)

I get the following error:


----> 1 waic = pm.waic(trace, model)

~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py in wrapped(*args, **kwargs)
22 )
23 kwargs[new] = kwargs.pop(old)
—> 24 return func(*args, **kwargs)
25
26 return wrapped

~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)
1176 """
1177 inference_data = convert_to_inference_data(data)
-> 1178 log_likelihood = _get_log_likelihood(inference_data)
1179 scale = rcParams["stats.ic_scale"] if scale is None else scale.lower()
1180

~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)
403 var_names.remove("lp")
404 if len(var_names) > 1:
–> 405 raise TypeError(
406 "Found several log likelihood arrays {}, var_name cannot be None".format(var_names)
407 )

TypeError: Found several log likelihood arrays [‘h_goals’, ‘a_goals’], var_name cannot be None

Is there any way to calculate WAIC and compare models when I have two likelihood functions in pymc3? (1: the goals scored by the home 2: the goals scored by the away team)

解决方案

It is possible but requires defining what are you interested in predicting, it can be the result of the match, or could be the number of goals scored by either team (not the aggregate, each match would then provide 2 results to predict).

A complete and detailed answer is available at PyMC discourse.

Here I transcribe the case where the quantity of interest is the result of the match as a summary. ArviZ will automatically retrieve 2 pointwise log likelihood arrays, which we have to combine somehow (e.g. add, concatenate, groupby...) to get a single array. The tricky part is knowing which operation corresponds to each quantity, which has to be assessed on a per model basis. In this particular example, the predictive accuracy of a match result can be calculated in the following way:

dims = {
    "home_points": ["match"],
    "away_points": ["match"],
}
idata = az.from_pymc3(trace, dims=dims, model=model)

Setting the match dim is important to tell xarray how to align the pointwise log likelihood arrays, otherwise they would not be broadcasted and aligned in the desired way.

idata.sample_stats["log_likelihood"] = (
    idata.log_likelihood.home_points + idata.log_likelihood.away_points
)
az.waic(idata)
# Output
# Computed from 3000 by 60 log-likelihood matrix
#
#           Estimate       SE
# elpd_waic  -551.28    37.96
# p_waic       46.16        -
#
# There has been a warning during the calculation. Please check the results.

Note that ArviZ>=0.7.0 is required.

这篇关于使用pymc3计算具有多个似然函数的模型的WAIC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆