后分布P斯坦的最高密度区间(HDI) [英] Highest Density Interval (HDI) for Posterior Distribution Pystan

查看:395
本文介绍了后分布P斯坦的最高密度区间(HDI)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到在Pystan中,HDI功能可用于在后分布周围提供95%的可信区间.但是,他们说这仅适用于单峰分布.如果我的模型可能具有多峰分布(最多4个峰),是否有办法在Pystan中找到HDI?谢谢!

I am seeing that in Pystan, an HDI function can be used to provide a 95% credible interval surrounding the posterior distribution. However, they say it will only work for unimodal distributions. If my model may have a multimodal distribution (up to 4 peaks), is there a way I can find the HDI in Pystan? Thanks!

推荐答案

我认为这不是Stan/PyStan的特定问题.根据定义,最高密度区间是单个区间,因此不适用于表征多峰分布. Rob Hyndman, 计算和绘制最高密度区域 ,将概念扩展到多峰分布,并且已在R中的

I wouldn't consider this a Stan/PyStan-specific issue. The Highest Density Interval is by definition a single interval and therefore inappropriate for characterizing multimodal distributions. There is a nice work by Rob Hyndman, Computing and Graphing Highest Density Regions, that extends the concept to multimodal distributions, and this has been implemented in R under the hdrcde package.

对于Python,有有关此内容的讨论PyMC Discourse网站,建议使用Osvaldo Martin在其用Python进行贝叶斯分析"一书中编写的函数(hpd_grid).该功能的来源位于 hpd.py文件中,对于95%的区域,将使用

As for Python, there's a discussion of this on the PyMC Discourse site, where it is recommended to use a function (hpd_grid) that Osvaldo Martin wrote for his "Bayesian Analysis with Python" book. The source for that function is in the hpd.py file, and for a 95% region would be used like

from hpd import hpd_grid

intervals, x, y, modes = hpd_grid(samples, alpha=0.05)

其中samples是您的参数之一的后验样本,intervals是表示最高密度区域的元组列表.

where samples are the posterior samples of one of your parameters, and intervals are a list of tuples representing the regions of highest density.

这是一个使用一些虚假的多模态数据的示例图.

Here's an example plot using some fake multimodal data.

import numpy as np
from matplotlib import pyplot as plt
from hpd import hpd_grid

# include two modes
samples = np.random.normal(loc=[-4,4], size=(1000, 2)).flatten()

# compute high density regions
hpd_mu, x_mu, y_mu, modes_mu = hpd_grid(samples)

plt.figure(figsize=(8,6))

# raw data
plt.hist(samples), density=True, bins=29, alpha=0.5)

# estimated distribution
plt.plot(x_mu, y_mu)

# high density intervals
for (x0, x1) in hpd_mu:
    plt.hlines(y=0, xmin=x0, xmax=x1, linewidth=5)
    plt.axvline(x=x0, color='grey', linestyle='--', linewidth=1)
    plt.axvline(x=x1, color='grey', linestyle='--', linewidth=1)

# modes
for xm in modes_mu:
    plt.axvline(x=xm, color='r')

plt.show()

应注意,在正确建模的参数上,多峰后验分布通常很少见,但在非聚合MCMC采样中确实非常频繁地出现,特别是在使用多链时(这是最佳做法).如果人们期望多模态先验,那么通常会导致某种形式的混合模型,从而消除多模态.如果不希望采用多模式,但后继者仍会展示它,那就是对结果表示怀疑的危险信号.

It should be noted that multimodal posterior distributions on properly modeled parameters are typically rare, but do come up extremely frequently in non-converged MCMC sampling, especially when multiple chains are used (which is the best practice). If one expects multimodality a priori, then usually that leads to some form of mixture model which would eliminate the multimodality. If one doesn't expect multimodality, but the posteriors exhibit it anyway, then that's a red flag to be skeptical of the results.

这篇关于后分布P斯坦的最高密度区间(HDI)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆