编写集成高斯的Python函数的最佳方法? [英] Best way to write a Python function that integrates a gaussian?

查看:130
本文介绍了编写集成高斯的Python函数的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试使用scipy的quad方法对高斯进行积分(让我们说有一种名为gauss的高斯方法)时,我遇到了将所需参数传递给高斯,而让quad对正确变量进行积分的问题.有谁有一个很好的例子说明如何使用带有多维函数的quad?

In attempting to use scipy's quad method to integrate a gaussian (lets say there's a gaussian method named gauss), I was having problems passing needed parameters to gauss and leaving quad to do the integration over the correct variable. Does anyone have a good example of how to use quad w/ a multidimensional function?

但是,这使我提出了一个更大的问题,那就是一般情况下集成高斯的最佳方法.我没有发现scipy中有高斯积分(令我惊讶).我的计划是编写一个简单的高斯函数,并将其传递给quad(或现在可能是固定宽度的积分器).你会怎么做?

But this led me to a more grand question about the best way to integrate a gaussian in general. I didn't find a gaussian integrate in scipy (to my surprise). My plan was to write a simple gaussian function and pass it to quad (or maybe now a fixed width integrator). What would you do?

固定宽度,类似于trapz,它使用固定dx来计算曲线下的面积.

Fixed-width meaning something like trapz that uses a fixed dx to calculate areas under a curve.

到目前为止,我所涉及的是make___gauss方法,该方法返回一个lambda函数,然后可以将其转换为quad函数.这样,我就可以使用积分之前需要的平均值和方差来制作正态函数.

What I've come to so far is a method make___gauss that returns a lambda function that can then go into quad. This way I can make a normal function with the average and variance I need before integrating.

def make_gauss(N, sigma, mu):
    return (lambda x: N/(sigma * (2*numpy.pi)**.5) *
            numpy.e ** (-(x-mu)**2/(2 * sigma**2)))

quad(make_gauss(N=10, sigma=2, mu=0), -inf, inf)

当我尝试传递一般的高斯函数(需要使用x,N,mu和sigma调用)并使用四边形(quad like)填充某些值时

When I tried passing a general gaussian function (that needs to be called with x, N, mu, and sigma) and filling in some of the values using quad like

quad(gen_gauss, -inf, inf, (10,2,0))

参数10、2和0不一定与N = 10,sigma = 2,mu = 0匹配,这提示了更广泛的定义.

the parameters 10, 2, and 0 did NOT necessarily match N=10, sigma=2, mu=0, which prompted the more extended definition.

scipy.special中的erf(z)将要求我准确定义最初的t,但很高兴知道它在其中.

The erf(z) in scipy.special would require me to define exactly what t is initially, but it nice to know it is there.

推荐答案

好的,您似乎对几件事很困惑.让我们从头开始:您提到了一个多维函数",但接着讨论了通常的一变量高斯曲线.这不是多维函数:在集成时,您仅集成了一个变量(x).进行区分非常重要,因为存在一个称为多元高斯分布"的怪兽,这是一个真正的多维函数,如果进行积分,则需要对两个或多个变量进行积分(使用昂贵的Monte我之前提到的Carlo技术).但是您似乎只是在谈论常规的一变量高斯函数,它更易于使用,集成以及所有这些功能.

Okay, you appear to be pretty confused about several things. Let's start at the beginning: you mentioned a "multidimensional function", but then go on to discuss the usual one-variable Gaussian curve. This is not a multidimensional function: when you integrate it, you only integrate one variable (x). The distinction is important to make, because there is a monster called a "multivariate Gaussian distribution" which is a true multidimensional function and, if integrated, requires integrating over two or more variables (which uses the expensive Monte Carlo technique I mentioned before). But you seem to just be talking about the regular one-variable Gaussian, which is much easier to work with, integrate, and all that.

单变量高斯分布具有两个参数,分别为sigmamu,并且是单个变量的函数,我们将其表示为x.您似乎还携带了归一化参数n(在几个应用程序中很有用).归一化参数通常不包含在计算中,因为您可以在末尾再加上它们(请记住,积分是线性运算符:int(n*f(x), x) = n*int(f(x), x)).但是,如果您愿意,我们可以随身携带;那么我喜欢的正态分布表示法就是

The one-variable Gaussian distribution has two parameters, sigma and mu, and is a function of a single variable we'll denote x. You also appear to be carrying around a normalization parameter n (which is useful in a couple of applications). Normalization parameters are usually not included in calculations, since you can just tack them back on at the end (remember, integration is a linear operator: int(n*f(x), x) = n*int(f(x), x) ). But we can carry it around if you like; the notation I like for a normal distribution is then

N(x | mu, sigma, n) := (n/(sigma*sqrt(2*pi))) * exp((-(x-mu)^2)/(2*sigma^2))

(读为给定sigmamunx的正态分布由...给出")这与您拥有的功能相匹配.请注意,这里唯一的 true变量x:对于任何特定的高斯,其他三个参数都是 fixed .

(read that as "the normal distribution of x given sigma, mu, and n is given by...") So far, so good; this matches the function you've got. Notice that the only true variable here is x: the other three parameters are fixed for any particular Gaussian.

现在有一个数学事实:所有高斯曲线都具有相同的形状,这是可以证明的事实,它们只是稍微移动了一点.因此,我们可以使用称为标准正态分布"的N(x|0,1,1),并将结果转换回一般的高斯曲线.因此,如果您具有N(x|0,1,1)的积分,则可以轻松地计算任何高斯的积分.这个积分经常出现,以至于有一个特殊的名称:错误函数 erf.由于某些古老的约定,它不是完全 erf;还有几个加法和乘法因素.

Now for a mathematical fact: it is provably true that all Gaussian curves have the same shape, they're just shifted around a little bit. So we can work with N(x|0,1,1), called the "standard normal distribution", and just translate our results back to the general Gaussian curve. So if you have the integral of N(x|0,1,1), you can trivially calculate the integral of any Gaussian. This integral appears so frequently that it has a special name: the error function erf. Because of some old conventions, it's not exactly erf; there are a couple additive and multiplicative factors also being carried around.

如果Phi(z) = integral(N(x|0,1,1), -inf, z);也就是说,Phi(z)是从负无穷大到z的标准正态分布的积分,那么根据对误差函数的定义,它是正确的

If Phi(z) = integral(N(x|0,1,1), -inf, z); that is, Phi(z) is the integral of the standard normal distribution from minus infinity up to z, then it's true by the definition of the error function that

Phi(z) = 0.5 + 0.5 * erf(z / sqrt(2)).

同样,如果Phi(z | mu, sigma, n) = integral( N(x|sigma, mu, n), -inf, z);也就是说,Phi(z | mu, sigma, n)是给定参数musigman从负无穷大到z的正态分布的积分,那么根据误差函数的定义

Likewise, if Phi(z | mu, sigma, n) = integral( N(x|sigma, mu, n), -inf, z); that is, Phi(z | mu, sigma, n) is the integral of the normal distribution given parameters mu, sigma, and n from minus infinity up to z, then it's true by the definition of the error function that

Phi(z | mu, sigma, n) = (n/2) * (1 + erf((x - mu) / (sigma * sqrt(2)))).

如果想了解更多详细信息,请查看普通CDF上的Wikipedia文章或对此事实的证明.

Take a look at the Wikipedia article on the normal CDF if you want more detail or a proof of this fact.

好的,那应该是足够的背景解释了.返回您的(已编辑)帖子.您说"scipy.special中的erf(z)将要求我确切定义t最初是什么".我不知道你的意思是什么. t(时间?)在哪里输入?希望上面的解释使错误功能变得有些神秘,并且现在更清楚为什么错误功能才是正确的工作.

Okay, that should be enough background explanation. Back to your (edited) post. You say "The erf(z) in scipy.special would require me to define exactly what t is initially". I have no idea what you mean by this; where does t (time?) enter into this at all? Hopefully the explanation above has demystified the error function a bit and it's clearer now as to why the error function is the right function for the job.

您的Python代码还可以,但是我宁愿使用闭包而不使用lambda:

Your Python code is OK, but I would prefer a closure over a lambda:

def make_gauss(N, sigma, mu):
    k = N / (sigma * math.sqrt(2*math.pi))
    s = -1.0 / (2 * sigma * sigma)
    def f(x):
        return k * math.exp(s * (x - mu)*(x - mu))
    return f

使用闭包可对常量ks进行预计算,因此返回的函数每次调用时都需要做更少的工作(这在集成时很重要,这意味着它将多次致电).另外,我避免了使用幂运算符**的操作,该操作比只写平方运算要慢,而是提升了内部循环的除数并将其替换为乘法.我没有看过它们在Python中的实现,但是从我上次使用原始x87程序集调整内部循环以获得纯速度以来,我似乎还记得加,减或乘每个大约需要4个CPU周期,除以36,乘幂约为200.那是几年前的事,所以把这些数字加上一粒盐就可以了.仍然说明了它们的相对复杂性.同样,计算exp(x)蛮力方式是一个非常糟糕的主意.编写exp(x)的良好实现时,可以采取一些技巧,使其比常规的a**b样式求幂运算更快,更准确.

Using a closure enables precomputation of constants k and s, so the returned function will need to do less work each time it's called (which can be important if you're integrating it, which means it'll be called many times). Also, I have avoided any use of the exponentiation operator **, which is slower than just writing the squaring out, and hoisted the divide out of the inner loop and replaced it with a multiply. I haven't looked at all at their implementation in Python, but from my last time tuning an inner loop for pure speed using raw x87 assembly, I seem to remember that adds, subtracts, or multiplies take about 4 CPU cycles each, divides about 36, and exponentiation about 200. That was a couple years ago, so take those numbers with a grain of salt; still, it illustrates their relative complexity. As well, calculating exp(x) the brute-force way is a very bad idea; there are tricks you can take when writing a good implementation of exp(x) that make it significantly faster and more accurate than a general a**b style exponentiation.

我从没使用过常量pi和e的numpy版本;我一直坚持使用普通的旧数学模块的版本.我不知道您为什么会选择其中之一.

I've never used the numpy version of the constants pi and e; I've always stuck with the plain old math module's versions. I don't know why you might prefer either one.

我不确定您要使用quad()电话做什么. quad(gen_gauss, -inf, inf, (10,2,0))应该对从负无穷大到正无穷大的重新归一化的高斯进行积分,并且应该始终吐出10(您的归一化因子),因为高斯在实线上积分为1.任何远离10的答案(因为quad()毕竟只是一个近似值,我都不希望确切地 10)意味着某处被搞砸了……很难说搞砸了什么却不知道实际返回值以及quad()的内部工作原理.

I'm not sure what you're going for with the quad() call. quad(gen_gauss, -inf, inf, (10,2,0)) ought to integrate a renormalized Gaussian from minus infinity to plus infinity, and should always spit out 10 (your normalization factor), since the Gaussian integrates to 1 over the real line. Any answer far from 10 (I wouldn't expect exactly 10 since quad() is only an approximation, after all) means something is screwed up somewhere... hard to say what's screwed up without knowing the actual return value and possibly the inner workings of quad().

希望这使一些混淆不解,并解释了为什么错误函数是您问题的正确答案,以及如果您有好奇心怎么做.如果我的解释不清楚,建议您先快速浏览一下Wikipedia.如果您还有问题,请随时提出.

Hopefully that has demystified some of the confusion, and explained why the error function is the right answer to your problem, as well as how to do it all yourself if you're curious. If any of my explanation wasn't clear, I suggest taking a quick look at Wikipedia first; if you still have questions, don't hesitate to ask.

这篇关于编写集成高斯的Python函数的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆