在 PyMC 中学习离散 HMM 参数 [英] Learning Discrete HMM parameters in PyMC

查看:32
本文介绍了在 PyMC 中学习离散 HMM 参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 PyMC 学习一个简单的离散 HMM 的参数.我正在对 HMM 上的 .

旁白:包含高斯 HMM 代码的 gist 真的很难理解!(未记录)

更新

根据以下答案,我尝试按如下方式更改代码:

@pm.stochastic(observed=True)def y(value=y_train, hidden_​​states = x_train):def logp(value, hidden_​​states):对数概率 = 0对于 xrange(0,len(hidden_​​states)) 中的 i:如果 hidden_​​states[i].value == 0:# 雨logprob = logprob + pm.categorical_like(value[i], theta_emission_rainy)别的:#阳光明媚logprob = logprob + pm.categorical_like(value[i], theta_emission_sunny)返回日志概率

下一步是创建模型,然后运行 ​​MCMC 算法.然而,上述编辑过的代码也不起作用.它给出了一个 ZeroProbability 错误.

我不确定我是否正确解释了答案.

解决方案

关于这个的一些想法:

  • Sunny 和 Rainy 是相互排斥且详尽的隐藏状态.为什么不将它们编码为一个单一的分类天气变量,它可以采用两个值之一(编码为晴天、雨天)?
  • 在似然函数中,您似乎观察到雨天/晴天.我在你的模型图中看到它的方式,这些似乎是隐藏的,而不是观察到的变量(即步行"、购物"和清洁")

在您的似然函数中,您需要对(对于所有时间步长 t)观测值(分别是步行、购物和清洁)的对数概率求和 给定当前(采样)值在同一时间步 t 下雨/晴天(即天气).

学习

如果您想学习模型的参数,您可能需要考虑切换到 PyMC3,它更适合自动计算 logp 函数的梯度.但是在这种情况下(因为您选择了共轭先验)这并不是必需的.如果您不知道共轭先验是什么,或者需要概述,请向 Wikipedia 查询共轭先验列表,它有一篇很棒的文章.

根据您想要做什么,您在此处有几个选择.如果您想从所有参数的后验分布中采样,只需像您一样指定您的 MCMC 模型,然后按下推理按钮,然后只需绘制并总结您感兴趣的参数的边际分布,就完成了.

如果您对边际后验分布不感兴趣,而是对寻找联合 MAP 参数感兴趣,您可以考虑使用期望最大化 (EM) 学习或模拟退火.两者都应该在 MCMC 框架内运行良好.

对于 EM 学习,只需重复这些步骤直到收敛:

  • E(期望)步骤:运行 MCMC 链一段时间并收集样本
  • M(最大化)步骤:更新 Beta 和 Dirichlet 先验的超参数,就好像这些样本是实际观察一样.如果您不知道该怎么做,请查看 Beta 和 Dirichlet 分布.

我会使用一个小的学习率因子,这样您就不会陷入第一个局部最优(现在我们正在接近模拟退火):不要将您从 MCMC 链中生成的 N 个样本视为实际观察结果,而是将它们处理作为 K 个观测值(对于 K << N 值),将超参数的更新按 K/N 的学习率因子缩小.

I am trying to learn the parameters of a simple discrete HMM using PyMC. I am modeling the rainy-sunny model from the Wiki page on HMM. The model looks as follows:

I am using the following priors.

theta_start_state ~ beta(20,10)
theta_transition_rainy ~beta(8,2)
theta_transition_sunny ~beta(2,8)
theta_emission_rainy ~ Dirichlet(3,4,3)
theta_emission_sunny ~ Dirichlet(10,6,4)

Initially, I use this setup to create a training set as follows.

## Some not so informative priors!
# Prior on start state
theta_start_state = pm.Beta('theta_start_state',12,8)

# Prior on transition from rainy
theta_transition_rainy = pm.Beta('transition_rainy',8,2)

# Prior on transition from sunny
theta_transition_sunny = pm.Beta('transition_sunny',2,8)

# Prior on emission from rainy
theta_emission_rainy = pm.Dirichlet('emission_rainy',[3,4,3])

# Prior on emission from sunny
theta_emission_sunny = pm.Dirichlet('emission_sunny',[10,6,4])

# Start state
x_train_0 = pm.Categorical('x_0',[theta_start_state, 1-theta_start_state])

N = 100

# Create a train set for hidden states
x_train = np.empty(N, dtype=object)

# Creating a train set of observations
y_train = np.empty(N, dtype=object)

x_train[0] = x_train_0

for i in xrange(1, N):
    if x_train[i-1].value==0:
        x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_rainy, 1- theta_transition_rainy])
    else:
        x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_sunny, 1- theta_transition_sunny])


for i in xrange(0,N):
    if x_train[i].value == 0:
        # Rain
        y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_rainy)
    else:
        y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_sunny)

However, I am not able to understand how to learn these parameters using PyMC. I made a start as follows.

@pm.observed
def y(x=x_train, value =y_train):    
    N = len(x)    
    out = np.empty(N, dtype=object)
    for i in xrange(0,N):
        if x[i].value == 0:
            # Rain
            out[i] = pm.Categorical('y_%d' %i, theta_emission_rainy)
        else:
            out[i] = pm.Categorical('y_%d' %i, theta_emission_sunny)
    return out

The complete notebook containing this code can be found here.

Aside: The gist containing HMM code for a Gaussian is really hard to understand! (not documented)

Update

Based on the answers below, I tried changing my code as follows:

@pm.stochastic(observed=True)
def y(value=y_train, hidden_states = x_train):
    def logp(value, hidden_states):
        logprob = 0 
        for i in xrange(0,len(hidden_states)):
            if hidden_states[i].value == 0:
                # Rain
                logprob = logprob + pm.categorical_like(value[i], theta_emission_rainy)
            else:
                # Sunny
                logprob = logprob + pm.categorical_like(value[i], theta_emission_sunny)
        return logprob 

The next step would be to create a model and then run the MCMC algorithm. However, the above edited code would also not work. It gives a ZeroProbability error.

I am not sure if I have interpreted the answers correctly.

解决方案

Just some thoughts on this:

  • Sunny and Rainy are mutually exclusive and exhaustive hidden states. Why don't you encode them as a single categorical weather variable which can take one of two values (coding for sunny, rainy) ?
  • In your likelihood function, you seem to observe Rainy / Sunny. The way I see it in your model graph, these seem to be the hidden, not the observed variables (that would be "walk", "shop" and "clean")

In your likelihood function, you need to sum (for all time steps t) the log-probability of the observed values (of walk, shop and clean respectively) given the current (sampled) values of rainy/sunny (i.e., Weather) at the same time step t.

Learning

If you want to learn parameters for the model, you might want to consider switching to PyMC3 which would be better suited for automatically computing gradients for your logp function. But in this case (since you chose conjugate priors) this is not really neccessary. If you don't know what Conjugate Priors are, or are in need of an overview, ask Wikipedia for List of Conjugate Priors, it has a great article on that.

Depending on what you want to do, you have a few choices here. If you want to sample from the posterior distribution of all parameters, just specify your MCMC model as you did, and press the inference button, after that just plot and summarize the marginal distributions of the parameters you're interested in, and you are done.

If you are not interested in marginal posterior distributions, but rather in finding the joint MAP paramters, you might consider using Expectation Maximization (EM) learning or Simulated Annealing. Both should work reasonably well within the MCMC Framework.

For EM Learning simply repeat these steps until convergence:

  • E (Expectation) Step: Run the MCMC chain for a while and collect samples
  • M (Maximization) Step: Update the hyperparamters for your Beta and Dirichlet Priors as if these samples had been actual observations. Look up the Beta and Dirichlet Distributions if you don't know how to do that.

I would use a small learning rate factor so you don't fall into the first local optimum (now we're approaching Simulated Annealing): Instead of treating the N samples you generated from the MCMC chain as actual observations, treat them as K observations (for a value K << N) by scaling the updates to the hyperparameters down by a learning rate factor of K/N.

这篇关于在 PyMC 中学习离散 HMM 参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆