对数伽玛函数的快速算法 [英] Fast algorithm for log gamma function

查看:519
本文介绍了对数伽玛函数的快速算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一种快速算法来计算 log伽玛函数.目前,我的实现似乎还很幼稚,只是迭代了1000万次来计算gamma函数的对数(我也使用numba来优化代码).

I am trying to write a fast algorithm to compute the log gamma function. Currently my implementation seems naive, and just iterates 10 million times to compute the log of the gamma function (I am also using numba to optimise the code).

import numpy as np
from numba import njit
EULER_MAS = 0.577215664901532 # euler mascheroni constant
HARMONC_10MIL = 16.695311365860007 # sum of 1/k from 1 to 10,000,000

@njit(fastmath=True)
def gammaln(z):
"""Compute log of gamma function for some real positive float z"""
    out = -EULER_MAS*z - np.log(z) + z*HARMONC_10MIL
    n = 10000000 # number of iters
    for k in range(1,n+1,4):
        # loop unrolling
        v1 = np.log(1 + z/k)
        v2 = np.log(1 + z/(k+1))
        v3 = np.log(1 + z/(k+2))
        v4 = np.log(1 + z/(k+3))
        out -= v1 + v2 + v3 + v4

    return out

我将代码计时为 scipy.special .gammaln 的实现和我的速度实际上要慢100,000倍.所以我在做些非常错误或非常幼稚的事情(可能两者兼而有之).尽管与scipy相比,我的回答至少在最差的小数点后4位内是正确的.

I timed my code against the scipy.special.gammaln implementation and mine is literally 100,000's times slower. So I am doing something very wrong or very naive (probably both). Although my answers are at least correct to within 4 decimal places at worst when compared to scipy.

我试图阅读实现scipy的gammaln函数的_ufunc代码,但是我不理解_gammaln函数所写的cython代码.

I tried to read the _ufunc code implementing scipy's gammaln function, however I don't understand the cython code that the _gammaln function is written in.

有没有更快,更优化的方法来计算对数伽马函数?我怎么能理解scipy的实现,以便将其与我的结合起来?

Is there a faster and more optimised way I can calculate the log gamma function? How can I understand scipy's implementation so I can incorporate it with mine?

推荐答案

函数的运行时将随着迭代次数线性扩展(直至某些恒定开销).因此,减少迭代次数是加快算法的关键.尽管事先计算HARMONIC_10MIL是一个聪明的主意,但实际上在截断该序列时会导致准确性降低.仅计算该系列的一部分即可提供更高的准确性.

The runtime of your function will scale linearly (up to some constant overhead) with the number of iterations. So getting the number of iterations down is key to speeding up the algorithm. Whilst computing the HARMONIC_10MIL beforehand is a smart idea, it actually leads to worse accuracy when you truncate the series; computing only part of the series turns out to give higher accuracy.

下面的代码是上面发布的代码的修改版本(尽管使用cython而不是numba).

The code below is a modified version of the code posted above (although using cython instead of numba).

from libc.math cimport log, log1p
cimport cython
cdef:
    float EULER_MAS = 0.577215664901532 # euler mascheroni constant

@cython.cdivision(True)
def gammaln(float z, int n=1000):
    """Compute log of gamma function for some real positive float z"""
    cdef:
        float out = -EULER_MAS*z - log(z)
        int k
        float t
    for k in range(1, n):
        t = z / k
        out += t - log1p(t)

    return out

如下图所示,即使经过100次逼近也可以获得近似值.

It is able to obtain a close approximation even after 100 approximations as shown in the figure below.

在100次迭代中,其运行时间与scipy.special.gammaln处于相同的数量级:

At 100 iterations, its runtime is of the same order of magnitude as scipy.special.gammaln:

%timeit special.gammaln(5)
# 932 ns ± 19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit gammaln(5, 100)
# 1.25 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

剩下的问题当然是要使用多少次迭代.可以将函数log1p(t)扩展为小t的泰勒级数(与大k的限制有关).特别是

The remaining question is of course how many iterations to use. The function log1p(t) can be expanded as a Taylor series for small t (which is relevant in the limit of large k). In particular,

log1p(t) = t - t ** 2 / 2 + ...

使得对于大k,总和的自变量变为

such that, for large k, the argument of the sum becomes

t - log1p(t) = t ** 2 / 2 + ...

因此,在t中,直到2阶的总和的自变量为零,如果t足够小,则可以忽略不计.换句话说,迭代次数应至少与z一样大,最好至少大一个数量级.

Consequently, the argument of the sum is zero up to second order in t which is negligible if t is sufficiently small. In other words, the number of iterations should be at least as large as z, preferably at least an order of magnitude larger.

但是,如果可能的话,我会坚持使用scipy经过良好测试的实现.

However, I'd stick with scipy's well-tested implementation if at all possible.

这篇关于对数伽玛函数的快速算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆