查找HMM中的前k个维特比路径 [英] Finding the top - k viterbi paths in HMM

查看:121
本文介绍了查找HMM中的前k个维特比路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一种算法来查找HMM中的前k个维特比路径(使用常规维特比算法来找到最佳路径)。

I need to write an algorithm that finds the top-k viterbi paths in a HMM (using the regular viterbi algorithm to find the best path).

我可能需要为每个状态N保存一个大小为k的列表V_t,N,其中包含以状态N结尾的前K个路径,但是我不确定如何跟踪该列表。.
any想法?谢谢

I think I probably need to save a list V_t,N of size k for each state N that contains the top-K paths that end in state N, but I am not so sure how to keep track of that list.. any ideas? Thanks

推荐答案

我们可以轻而易举地解决此问题。通过查看hmm的网格结构最容易看到:

We can solve this with some care. It is easiest to see by looking at the trellis structure of hmm:

在此示例中,隐藏状态为00、01、10、11,将这四个状态表示为S。未显示,但假定它们为0,1。

In this example the hidden states are 00, 01, 10, 11, denote the set of these four as S. The observations are not shown, but assume they are 0,1.

假设我们有正确的过渡矩阵:

Suppose that we have the correct Transition Matrix:

transition[4][4]

排放概率:

emissions[4][2]

和初始概率:

p[2]

所以每一列都代表隐藏状态,Viterbi的目标是根据观察结果计算最可能的隐藏状态序列。现在令alpha(i,t)=隐藏状态序列处于状态i(i是00、01、10、11中的一个)的最大概率,在时间t处的观测值是o_t(o_t是1的0,1)。让第一个观测值表示为o_1。可以有效地计算为:

So every column represents the hidden states, and the goal of Viterbi is to compute the most likely hidden state sequence given the observations. Now let alpha(i, t) = the largest probability that the hidden state sequence is in state i (i is one of 00, 01, 10, 11), at time t where the observation at time t is o_t (o_t is one of 0, 1). Let the first observation be denoted o_1. It can be computed efficiently as:

alpha(i, 1) = p[i] * emissions[i][o_1]
alpha(i, t) = emissions[i][o_t] * max_{k in states} (alpha(k, t-1) * transition[k][i]) 

为了找到最佳路径,我们在alpha(i,t)步骤中保持指针向后,直到最大化上面的最大功能。最后,我们只检查状态中的所有alpha(i,T)和i,然后找出哪个最大,然后跟着返回最可能的状态序列。

In order to find the best path, we keep pointers backwards in the alpha(i, t) step, to the state which maximized the max function in above. Finally we just examine all of the alpha(i, T) and for i in states, and find which one is largest, then follow it back to get the most likely state sequence.

现在,我们需要扩展它以存储前k个路径。目前,在每个alpha(i,t)中,我们仅存储一个父级。但是,假设我们存储了前k个前身。因此,每个alpha(i,t)不仅对应于最有可能发生变化的值及其从其过渡的节点,而且还对应于其可能从其过渡的前k个节点及其值的排序顺序的列表。

Now we need to extend this to store top k-paths. Currently at each alpha(i,t) we only store one parent. However suppose we stored the top k predecessors. So each alpha(i, t) corresponds not only to a most likely value and the node which it transitioned from, but a list of the top k nodes it could have transitioned from and their values in sorted order.

这很容易做到,而不是做max,并且只取一个前节点,我们取前k个前节点并存储它们。现在,对于基本情况,没有前面的节点,因此alpha(i,1)仍然只是一个值。当我们到达任意列(例如t)并想要找到在该列中的节点(i)处结束的前k个路径时,我们必须找到前k个前辈来自它们的前几个路径。

This is easy to do, in that instead of doing max, and take only one preceding node we take the top k preceding nodes and store them. Now for the base case there is no preceding node so alpha(i, 1), still is only a single value. When we arrive at an arbitrary column (say t) and want to find the top-k paths ending at a node (i) in that column, we must find the top k predecessors to come from and the top paths to take from them.

这就像我们有以下问题一样,一个矩阵m,大小为4 x k,其中一行代表前一个状态,m [state]代表到此为止的路径的前k个概率。因此,m的每一行都按从大到小的顺序进行排序,现在问题就可以找到:

This is as if we have the following problem, a matrix, m, of size 4 by k, where a row represents a preceding state and m[state] represents the top k probabilities for paths ending there. Thus each row of m is sorted by largest to smallest, the problem now becomes find:

Best_K_Values(t, i) = Top K over all i,preceding_state,k (emissions[i][o_t] * m[preceding_state][k] * transition[preceding_state][i])

现在这看起来令人生畏,但需要一些时间来理解它,我们可以使用O(4 log k)或O(numStates log k)中的堆来解决排序矩阵问题中的前k个)。

Now this looks daunting but take some time to understand it, we can solve the top k from sorted matrix problem using a heap in O(4 log k) or O(numStates log k) in general.

请参阅此细微变化已排序矩阵中的第K个最小元素,请注意,在本例中,列未进行排序,但是这里的想法仍然适用。

See this slight variation Kth smallest element in sorted matrix, just note that in our case the columns aren't sorted, but the idea there still applies.

如果您仍然在阅读,然后请注意,此方法的总体复杂度为O((numStates log k)* numStates * t)= O(numStates ^ 2 * t * log k)(我相信这是正确的复杂性)。

If you are still reading, then note that the overall complexity of this method is O((numStates log k) * numStates * t) = O(numStates^2 * t * log k) (I believe that's correct complexity).

这可能很难理解,但是如果您有问题,请告诉我有任何疑问,或者我做错了什么。

This may be hard to follow, but please let me know if you have any questions, or I have done something incorrectly.

这篇关于查找HMM中的前k个维特比路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆