加快都市圈-Python的停滞 [英] Speed up Metropolis--Hastings in Python

查看:86
本文介绍了加快都市圈-Python的停滞的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码使用MCMC对后验分布进行采样,特别是 Metropolis Hastings .我使用scipy生成随机样本:

I have some code that samples a posterior distribution using MCMC, specifically Metropolis Hastings. I use scipy to generate random samples:

import numpy as np
from scipy import stats

def get_samples(n):
    """
    Generate and return a randomly sampled posterior.

    For simplicity, Prior is fixed as Beta(a=2,b=5), Likelihood is fixed as Normal(0,2)

    :type n: int
    :param n: number of iterations

    :rtype: numpy.ndarray
    """
    x_t = stats.uniform(0,1).rvs() # initial value
    posterior = np.zeros((n,))
    for t in range(n):
        x_prime = stats.norm(loc=x_t).rvs() # candidate
        p1 = stats.beta(a=2,b=5).pdf(x_prime)*stats.norm(loc=0,scale=2).pdf(x_prime) # prior * likelihood 
        p2 = stats.beta(a=2,b=5).pdf(x_t)*stats.norm(loc=0,scale=2).pdf(x_t) # prior * likelihood 
        alpha = p1/p2 # ratio
        u = stats.uniform(0,1).rvs() # random uniform
        if u <= alpha:
            x_t = x_prime # accept
            posterior[t] = x_t
        elif u > alpha:
            x_t = x_t # reject
    posterior = posterior[np.where(posterior > 0)] # get rid of initial zeros that don't contribute to distribution
    return posterior

通常,我会尽量避免在python中使用显式的for循环-我会尝试使用纯numpy生成所有内容.但是,对于这种算法,使用if语句的for循环是不可避免的.因此,代码很慢.当我分析我的代码时,它花费了大部分时间(显然)在for循环内,更具体地说,最慢的部分是生成随机数. stats.beta().pdf()stats.norm().pdf().

Generally, I try to avoid using explicit for loops in python - I would try to generate everything using pure numpy. For the case of this algorithm however, a for loop with if statements is unavoidable. Therefore, the code is quite slow. When I profile my code, it is spending most time within the for loop (obviously), and more specifically, the slowest parts are in generating the random numbers; stats.beta().pdf() and stats.norm().pdf().

有时候,我使用 numba 来加快我的矩阵运算代码.尽管numba与某些numpy操作兼容,但生成随机数并不是其中之一. Numba具有 cuda rng ,但这仅限于正常情况和均匀的分布.

Sometimes I use numba to speed up my code for matrix operations. While numba is compatible with some numpy operations, generating random numbers is not one of them. Numba has a cuda rng, but this is limited to normal and uniform distributions.

我的问题是,有没有一种方法可以使用某种与numba兼容的各种分布的随机采样来显着加快上述代码的运行速度?

My question is, is there a way to significantly speed up the code above using some sort of random sampling of various distributions compatible with numba?

我们不必局限于numba,但这是我所知道的唯一易于使用的优化程序.更笼统地说,我正在寻找在python中的for循环内在各种分布(beta,gamma,泊松)上加快随机采样速度的方法.

We don't have to limit ourselves to numba, but it is the only easy to use optimizer that I know of. More generally, I am looking for ways to speed up random sampling for various distributions (beta, gamma, poisson) within a for loop in python.

推荐答案

在开始考虑numba et.之前,您可以对此代码进行很多的优化. al. (仅通过对算法的实现很精明,我设法使该代码的速度提高了25倍)

There are lots of optimisations you can make to this code before you start thinking about numba et. al. (I managed to get a 25x speed up on this code only by being smart with the algorithm's implementation)

首先,您在实施Metropolis-Hastings算法时出现错误.无论您的链条是否移动,都需要保持该方案的每次迭代.也就是说,您需要从代码中删除posterior = posterior[np.where(posterior > 0)],并在每个循环的末尾添加posterior[t] = x_t.

Firstly, there's an error in your implementation of the Metropolis--Hastings algorithm. You need to keep every iteration of the scheme, regardless of whether your chain moves or not. That is, you need to remove posterior = posterior[np.where(posterior > 0)] from your code and at the end of each loop have posterior[t] = x_t.

第二,这个例子看起来很奇怪.通常,考虑到这些类型的推断问题,我们希望根据一些观察来推断分布的参数.不过,在这里,分布的参数是已知的,而您是在对观测值进行采样?无论如何,无论如何,我都很乐意为您介绍示例,并向您展示如何加快示例速度.

Secondly, this example seems odd. Typically, with these kinds of inference problems we're looking to infer the parameters of a distribution given some observations. Here, though, the parameters of the distribution are known and instead you're sampling observations? Anyway, whatever, regardless of this I'm happy to roll with your example and show you how to speed it up.

要开始使用,请从主for循环中删除所有与t值无关的内容.首先从for循环中删除随机游动创新的生成:

To get started, remove anything which is not dependent on the value of t from the main for loop. Start by removing the generation of the random walk innovation from the for loop:

    x_t = stats.uniform(0,1).rvs()
    innov = stats.norm(loc=0).rvs(size=n)
    for t in range(n):
        x_prime = x_t + innov[t]

当然也可以从for循环中移动u的随机生成:

Of course it is also possible to move the random generation of u from the for loop:

    x_t = stats.uniform(0,1).rvs()
    innov = stats.norm(loc=0).rvs(size=n)

    u = np.random.uniform(size=n)
    for t in range(n):
        x_prime = x_t + innov[t]
        ...
        if u[t] <= alpha:

另一个问题是,您在每个循环中都计算当前后验p2,这不是必需的.在每个循环中,您都需要计算建议的后验p1,并且当建议被接受时,您可以将p2更新为等于p1:

Another issue is that you're computing the current posterior p2 in every loop, which isn't necessary. In each loop you need to calculate the proposed posterior p1, and when the proposal is accepted you can update p2 to equal p1:

    x_t = stats.uniform(0,1).rvs()
    innov = stats.norm(loc=0).rvs(size=n)

    u = np.random.uniform(size=n)

    p2 = stats.beta(a=2,b=5).pdf(x_t)*stats.norm(loc=0,scale=2).pdf(x_t)
    for t in range(n):
        x_prime = x_t + innov[t]

        p1 = stats.beta(a=2,b=5).pdf(x_prime)*stats.norm(loc=0,scale=2).pdf(x_prime)
        ...
        if u[t] <= alpha:
            x_t = x_prime # accept
            p2 = p1

        posterior[t] = x_t

一个非常小的改进可能是将scipy stats函数直接导入名称空间:

A very minor improvement could be in importing the scipy stats functions directly into the name space:

from scipy.stats import norm, beta

另一个非常小的改进是注意到代码中的elif语句不执行任何操作,因此可以将其删除.

Another very minor improvement is in noticing that the elif statement in your code doesn't do anything and so can be removed.

将其完全放在一起,并使用更合理的变量名,我想到了:

Putting this altogether and using more sensible variable names I came up with:

from scipy.stats import norm, beta
import numpy as np

def my_get_samples(n, sigma=1):

    x_cur = np.random.uniform()
    innov = norm.rvs(size=n, scale=sigma)
    u = np.random.uniform(size=n)

    post_cur = beta.pdf(x_cur, a=2, b=5) * norm.pdf(x_cur, loc=0, scale=2)

    posterior = np.zeros(n)
    for t in range(n):
        x_prop = x_cur + innov[t]

        post_prop = beta.pdf(x_prop, a=2, b=5) * norm.pdf(x_prop, loc=0, scale=2)
        alpha = post_prop / post_cur
        if u[t] <= alpha:
            x_cur = x_prop
            post_cur = post_prop

        posterior[t] = x_cur

    return posterior

现在,为了进行速度比较:

Now, for a speed comparison:

%timeit get_samples(1000)
3.19 s ± 5.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit my_get_samples(1000)
127 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

速度提高了25倍

值得注意的是,对于MCMC算法而言,蛮横的速度并不是万能的.确实,您感兴趣的是您可以从每秒的后验次数中进行独立绘制( ish )的次数.通常,使用 ESS(有效样本大小).您可以通过调整随机游走来提高算法的效率(从而提高每秒提取的有效样本数).

It's worth noting that brute speed isn't everything when it comes to MCMC algorithms. Really, what you're interested in is the number of independent(ish) draws you can make from the posterior per second. Typically, this is assessed with the ESS (effective sample size). You can improve the efficiency of your algorithm (and hence increase your effective samples drawn per second) by tuning your random walk.

要这样做,通常要进行初次试运行,即samples = my_get_samples(1000).从此输出计算sigma = 2.38**2 * np.var(samples).然后,应使用此值将方案中的随机游走调整为innov = norm.rvs(size=n, scale=sigma).似乎是任意出现的2.38 ^ 2的起源是:

To do so it is typical to make an initial trial run, i.e. samples = my_get_samples(1000). From this output calculate sigma = 2.38**2 * np.var(samples). This value should then be used to tune the random walk in your scheme as innov = norm.rvs(size=n, scale=sigma). The seemingly arbitrary occurrence of 2.38^2 has it's origin in:

随机步行都会区的弱收敛和最优缩放 算法(1997). A. Gelman,W.R. Gilks​​和G.O. Roberts.

Weak convergence and optimal scaling of random walk Metropolis algorithms (1997). A. Gelman, W. R. Gilks, and G. O. Roberts.

为说明调整可以使我们对算法进行两次运行(一次已调整,另一次未调整)的改进,均使用10000次迭代:

To illustrate the improvements tuning can make let's make two runs of my algorithm, one tuned and the other untuned, both using 10000 iterations:

x = my_get_samples(10000)
y = my_get_samples(10000, sigma=0.12)

fig, ax = plt.subplots(1, 2)
ax[0].hist(x, density=True, bins=25, label='Untuned algorithm', color='C0')
ax[1].hist(y, density=True, bins=25, label='Tuned algorithm', color='C1')
ax[0].set_ylabel('density')
ax[0].set_xlabel('x'), ax[1].set_xlabel('x')
fig.legend()

您可以立即看到调整对我们算法的效率所做的改进.记住,两次运行都进行了相同数量的迭代.

You can immediately see the improvements tuning has made to our algorithm's efficiency. Remember, both runs were made for the same number of iterations.

如果您的算法需要很长时间才能收敛,或者您的样本具有大量的自相关,我会考虑研究 Cython 进一步优化速度.

If your algorithm is taking a very long time to converge, or if your samples have large amounts of autocorrelation, I'd consider looking into Cython to squeeze out further speed optimisations.

我还建议您检出 PyStan 项目.这需要一点时间来适应,但是它的NUTS HMC算法可能会胜过任何您可以手工编写的Metropolis-Hastings算法.

I'd also recommend checking out the PyStan project. It takes a bit of getting used to, but its NUTS HMC algorithm will likely outperform any Metropolis--Hastings algorithm you can write by hand.

这篇关于加快都市圈-Python的停滞的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆