为什么我们需要LDA中的超参数beta和alpha? [英] Why do we need the hyperparameters beta and alpha in LDA?

查看:106
本文介绍了为什么我们需要LDA中的超参数beta和alpha?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试理解潜在狄利克雷分配(LDA)的技术部分,但是我心中有几个问题:

I'm trying to understand the technical part of Latent Dirichlet Allocation (LDA), but I have a few questions on my mind:

首先:为什么每次采样下面的方程式时都需要添加alpha和gamma?如果我们从等式中删除alpha和gamma怎么办?是否还有可能得到结果?

First: Why do we need to add alpha and gamma every time we sample the equation below? What if we delete the alpha and gamma from the equation? Would it still be possible to get the result?

第二:在LDA中,我们为文档中的每个单词随机分配一个主题.然后,我们尝试通过观察数据来优化主题.上式中与后验推论有关的部分在哪里?

Second: In LDA, we randomly assign a topic to every word in the document. Then, we try to optimize the topic by observing the data. Where is the part which is related to posterior inference in the equation above?

推荐答案

如果您查看推理在Wiki上派生,引入alpha和beta的原因很简单,因为theta和phi都是从它们各自唯一确定的Dirichlet分布中得出的.选择Dirichlet分布作为先验分布(例如P(phi | beta))的原因主要是为了使数学易于使用共轭先验的形式来解决(这里是Dirichlet和分类分布),分类分配是跨国分配的一种特殊情况,其中n设置为1,即只有一个审判.同样,狄利克雷分布可以帮助我们注入"我们的信念,即文档主题和主题词的分布集中在文档或主题的几个主题和词中(如果我们设置低超参数).如果您删除Alpha和Beta,我不确定它将如何工作.

If you look at the inference derivation on Wiki, the alpha and beta are introduced simply because the theta and phi are both drawn from Dirichlet distribution uniquely determined by them separately. The reason of choosing Dirichlet distribution as the prior distribution (e.g. P(phi|beta)) are mainly for making the math feasible to tackle by utilizing the nice form of conjugate prior (here is Dirichlet and categorical distribution, categorical distribution is a special case of multinational distribution where n is set to one, i.e. only one trial). Also, the Dirichlet distribution can help us "inject" our belief that doc-topic and topic-word distribution are centered in a few topics and words for a document or topic (if we set low hyperparameters). If you remove alpha and beta, I am not sure how it will work.

后验推理被联合概率推理所取代,至少在Gibbs抽样中,您需要联合概率,同时选择一个维来转变状态",就像Metropolis-Hasting范例所做的那样.您在此处输入的公式基本上是从联合概率P(w,z)推导出来的.我想向您推荐《蒙特卡洛统计方法》(罗伯特出版)一书,以全面理解推理的作用.

The posterior inference is replaced with joint probability inference, at least in Gibbs sampling, you need joint probability while pick one dimension to "transform the state" as the Metropolis-Hasting paradigm does. The formula you put here is essentially derived from the joint probability P(w,z). I would like to refer you the book Monte Carlo Statistical Methods (by Robert) to fully understand why inference works.

这篇关于为什么我们需要LDA中的超参数beta和alpha?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆