最大似然估计伪码 [英] Maximum Likelihood Estimate pseudocode

查看:68
本文介绍了最大似然估计伪码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个最大似然估计器,以估计某些玩具数据的均值和方差.我有一个包含100个样本的向量,该向量用numpy.random.randn(100)创建.数据应具有零均值和单位方差高斯分布.

I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100). The data should have zero mean and unit variance Gaussian distribution.

我检查了Wikipedia和一些其他资源,但由于没有统计背景,我有点困惑.

I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background.

最大似然估计器是否有任何伪代码?我对MLE有直觉,但我不知道从哪里开始编码.

Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding.

Wiki说采用对数似然的argmax.我的理解是:我需要通过使用不同的参数来计算对数似然率,然后我将采用给出最大概率的参数.我不明白的是:首先我将在哪里找到参数?如果我随机尝试其他均值&差异很大,我什么时候应该停止尝试?

Wiki says taking argmax of log-likelihood. What I understand is: I need to calculate log-likelihood by using different parameters and then I'll take the parameters which gave the maximum probability. What I don't get is: where will I find the parameters in the first place? If I randomly try different mean & variance to get a high probability, when should I stop trying?

推荐答案

如果执行最大似然计算,则需要执行的第一步如下:假定依赖于某些参数的分布.由于您generate您的数据(您甚至知道您的参数),因此您可以告诉"您的程序以采用高斯分布.但是,您不会告诉程序您的参数(0和1),而是先让它们未知,然后再进行计算.

If you do maximum likelihood calculations, the first step you need to take is the following: Assume a distribution that depends on some parameters. Since you generate your data (you even know your parameters), you "tell" your program to assume Gaussian distribution. However, you don't tell your program your parameters (0 and 1), but you leave them unknown a priori and compute them afterwards.

现在,您有了样本矢量(将其称为x,其元素为x[0]x[100]),并且必须对其进行处理.为此,您必须计算以下内容(f表示高斯概率密度函数分布):

Now, you have your sample vector (let's call it x, its elements are x[0] to x[100]) and you have to process it. To do so, you have to compute the following (f denotes the probability density function of the Gaussian distribution):

f(x[0]) * ... * f(x[100])

如您在我给定的链接中看到的,f使用两个参数(希腊字母µ和σ). 现在必须以f(x[0]) * ... * f(x[100])取最大可能值的方式来计算µ和σ的值.

As you can see in my given link, f employs two parameters (the greek letters µ and σ). You now have to calculate the values for µ and σ in a way such that f(x[0]) * ... * f(x[100]) takes the maximum possible value.

完成此操作后,μ是平均值的最大似然值,而σ是标准差的最大似然值.

When you've done that, µ is your maximum likelihood value for the mean, and σ is the maximum likelihood value for standard deviation.

请注意,我没有明确告诉您如何计算μ和σ的值,因为这是我手头上没有的相当数学的过程(可能我不会理解它);我只是告诉您获取值的技术,该技术也可以应用于任何其他发行版.

Note that I don't explicitly tell you how to compute the values for µ and σ, since this is a quite mathematical procedure I don't have at hand (and probably I would not understand it); I just tell you the technique to get the values, which can be applied to any other distributions as well.

由于要最大化原始项的对数,因此可以简单地"最大化原始项的对数-这样可以避免处理所有这些乘积,并将原始项转换为具有某些求和项的总和.

Since you want to maximize the original term, you can "simply" maximize the logarithm of the original term - this saves you from dealing with all these products, and transforms the original term into a sum with some summands.

如果您真的要计算它,可以做一些简化,得出以下术语(希望我没有弄乱任何东西):

If you really want to calculate it, you can do some simplifications that lead to the following term (hope I didn't mess up anything):

现在,您必须找到µ和σ的值,以便上述野兽最大.这样做是一项非常重要的任务,称为非线性优化.

Now, you have to find values for µ and σ such that the above beast is maximal. Doing that is a very nontrivial task called nonlinear optimization.

您可以尝试的一种简化如下:修复一个参数,然后尝试计算另一个参数.这样可以避免同时处理两个变量.

One simplification you could try is the following: Fix one parameter and try to calculate the other. This saves you from dealing with two variables at the same time.

这篇关于最大似然估计伪码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆