期望最大化抛硬币的例子 [英] Expectation Maximization coin toss examples

查看:162
本文介绍了期望最大化抛硬币的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在自学期望最大化最近,并抓住自己在这个过程中一些简单的例子:

<一个href="http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf">http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf 有扔时,有3个硬币0,1和2 P0,P1和P2概率登陆头。抛硬币0,如果结果为头部,折腾硬币1三倍别人抛硬币2的三倍。由硬币1和2产生的观测数据是这样的:HHH,TTT,HHH,TTT,HHH。隐藏的数据是硬币0的结果。估计P0,P1和P2。

<一个href="http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf">http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf 有两个硬币A和B与PA和PB作为概率登陆头扔的时候。每一轮中,选择一个硬币随机,抛10次,然后记录结果。所观察到的数据是由这两种硬币所提供的折腾的结果。然而,我们不知道哪个硬币被选定为一个特定圆。估算PA和PB。

虽然我可以得到计算,我可以不涉及他们解决了原来的电磁理论的方式。具体而言,在对两个例子中,M-步骤,我看不出他们是如何最大限度地提高任何东西。这似乎只是他们重新计算参数,并以某种方式,新参数比旧的好。此外,两个E-步骤甚至不看彼此相似,更何况原有理论的E-步骤。

那么究竟如何做这些例子的工作?

解决方案

第二个PDF不会下载我,但我还参观了维基百科页面<一href="http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm">http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm它有更多的信息。 <一href="http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf">http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf (号称是一个温柔的介绍)可能是值得一试了。

EM算法的整点是要找到它最大限度地观测数据的可能性参数。这是第一个PDF,方程资本西塔标ML第8页上唯一的小点。

EM算法就派上用场了那里隐藏数据这会使得问题容易,如果你知道这一点。在这三个硬币的例子,这是掷硬币0的结果,如果你知道那结果(当然),你可以产生一个估计硬​​币0调高头的概率。你也应该知道硬币1或2硬币是否扔在下一阶段的三倍,这将让你做出估计的硬币1和2的硬币翻了头的概率。从硬币0的结果对于一个硬币,获取 - 这些估计会说,他们最大的观测数据,这将不仅包括您所给出的结果,还以为你是不是隐藏数据的可能性是合理的抬起头和B的尾巴,你会发现最大的似然的正面的概率是A /(A + B) - 这可能是值得你的工作了这一点,在细节,因为它是构建模块的M步

在EM算法,你说的是,虽然你不知道的隐藏数据,你进来的概率估计,让你可以写下它的概率分布。为隐藏的数据的每个可能值可以找到的参数值这将优化数据包括隐藏数据的对数似然,并且这几乎总是证明为是指计算某种加权平均的(如果不对EM步骤可以是太困难而不实用)。

什么是EM算法要求你做的是找到最大限度地通过所有可能隐藏的数据值,其中权重由关联的隐藏数据的概率给出使用的意见给定的日志可能性的加权和参数参数在对EM步骤的开始。这是几乎每个人,包括维基百科的算法,调用Q-功能。 EM算法背后的证明,维基百科的文章中给出,说,如果你改变参数,从而增加了Q-功能(这只是一个手段而不是目的),你会也改变了他们,从而增加可能观察到的数据(你关心)的。你会发现在实践中有什么是你可以使用你会做什么,如果你知道隐藏的数据,但使用的隐藏数据的概率的变化最大化Q函数,给出的估计在EM-开始步骤,加权观测值以某种方式。

在你的榜样,它意味着totting了由每个硬币生产头有尾的号码。在PDF他们制定出P(Y = H | X =)= 0.6967。这意味着,在使用重量0.6967的情况下,Y = H,这意味着你递增计数Y = H由0.6967和由3 * 0.6967递增计数X = H在硬币1,而你递增为Y中的计数= T由0.3033和递增计数X = H硬币2 3 * 0.3033。如果您对为什么A /(A + B)是在标准情况下,硬币的概率最大似然的详细理由,你应该准备好把它变成一个理由为什么这个权重更新方案最大化Q函数。

最后,观测数据的数似然(你是最大化的东西),为您提供了一个非常有用的检查。它应该增加每EM一步,至少要等到你如此接近收敛舍入误差进来,在这种情况下,你可能有一个很小的下降,信令收敛。如果它极大地降低了,你在你的程序或数学的一个错误。

I've been self-studying the Expectation Maximization lately, and grabbed myself some simple examples in the process:

http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf There are 3 coins 0, 1 and 2 with P0, P1 and P2 probability landing on Head when tossed. Toss coin 0, if the result is Head, toss coin 1 three times else toss coin 2 three times. The observed data produced by coin 1 and 2 is like this: HHH, TTT, HHH, TTT, HHH. The hidden data is coin 0's result. Estimate P0, P1 and P2.

http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf There are two coins A and B with PA and PB being the probability landing on Head when tossed. Each round, select one coin at random and toss it 10 times then record the results. The observed data is the toss results provided by these two coins. However, we don't know which coin was selected for a particular round. Estimate PA and PB.

While I can get the calculations, I can't relate the ways they are solved to the original EM theory. Specifically, during the M-Step of both examples, I don't see how they're maximizing anything. It just seems they are recalculating the parameters and somehow, the new parameters are better than the old ones. Moreover, the two E-Steps don't even look similar to each other, not to mention the original theory's E-Step.

So how exactly do these example work?

解决方案

The second PDF won't download for me, but I also visited the wikipedia page http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm which has more information. http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf (which claims to be a gentle introduction) might be worth a look too.

The whole point of the EM algorithm is to find parameters which maximize the likelihood of the observed data. This is the only bullet point on page 8 of the first PDF, the equation for capital Theta subscript ML.

The EM algorithm comes in handy where there is hidden data which would make the problem easy if you knew it. In the three coins example this is the result of tossing coin 0. If you knew the outcome of that you could (of course) produce an estimate for the probability of coin 0 turning up heads. You would also know whether coin 1 or coin 2 was tossed three times in the next stage, which would allow you to make estimates for the probabilities of coin 1 and coin 2 turning up heads. These estimates would be justified by saying that they maximized the likelihood of the observed data, which would include not only the results that you are given, but also the hidden data that you are not - the results from coin 0. For a coin that gets A heads and B tails you find that the maximum likelihood for the probability of A heads is A/(A+B) - it might be worth you working this out in detail, because it is the building block for the M step.

In the EM algorithm you say that although you don't know the hidden data, you come in with probability estimates which allow you to write down a probability distribution for it. For each possible value of the hidden data you could find the parameter values which would optimize the log likelihood of the data including the hidden data, and this almost always turns out to mean calculating some sort of weighted average (if it doesn't the EM step may be too difficult to be practical).

What the EM algorithm asks you to do is to find the parameters maximizing the weighted sum of log likelihoods given by all the possible hidden data values, where the weights are given by the probability of the associated hidden data given the observations using the parameters at the start of the EM step. This is what almost everybody, including the Wikipedia algorithm, calls the Q-function. The proof behind the EM algorithm, given in the Wikipedia article, says that if you change the parameters so as to increase the Q-function (which is only a means to an end), you will also have changed them so as to increase the likelihood of the observed data (which you do care about). What you tend to find in practice is that you can maximize the Q-function using a variation of what you would do if you know the hidden data, but using the probabilities of the hidden data, given the estimates at the start of the EM-step, to weight the observations in some way.

In your example it means totting up the number of heads and tails produced by each coin. In the PDF they work out P(Y=H|X=) = 0.6967. This means that you use weight 0.6967 for the case Y=H, which means that you increment the counts for Y=H by 0.6967 and increment the counts for X=H in coin 1 by 3*0.6967, and you increment the counts for Y=T by 0.3033 and increment the counts for X=H in coin 2 by 3*0.3033. If you have a detailed justification for why A/(A+B) is a maximum likelihood of coin probabilities in the standard case, you should be ready to turn it into a justification for why this weighted updating scheme maximizes the Q-function.

Finally, the log likelihood of the observed data (the thing you are maximizing) gives you a very useful check. It should increase with every EM step, at least until you get so close to convergence that rounding error comes in, in which case you may have a very small decrease, signalling convergence. If it decreases dramatically, you have a bug in your program or your maths.

这篇关于期望最大化抛硬币的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆