在numpy中乘以对数概率矩阵的数值稳定方法 [英] numerically stable way to multiply log probability matrices in numpy

查看:102
本文介绍了在numpy中乘以对数概率矩阵的数值稳定方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要获取两个包含对数概率的NumPy矩阵(或其他2d数组)的矩阵乘积.出于明显的原因,不建议使用天真的np.log(np.dot(np.exp(a), np.exp(b))).

使用

from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
    # broadcast b[:,n] over rows of a, sum columns
    res[:, n] = logsumexp(a + b[:, n].T, axis=1) 

可以运行,但运行速度比np.log(np.dot(np.exp(a), np.exp(b)))

慢100倍

使用

logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T

或平铺和重塑的其他组合也可以工作,但运行速度比上面的循环还要慢,这是由于实际大小的输入矩阵所需的内存过大.

我目前正在考虑用C编写一个NumPy扩展来计算它,但是我当然希望避免这种情况.有没有确定的方法来执行此操作,或者有人知道执行这种计算的内存占用较少的方法吗?

感谢larsmans提供此解决方案(请参见下文以了解派生方法):

def logdot(a, b):
    max_a, max_b = np.max(a), np.max(b)
    exp_a, exp_b = a - max_a, b - max_b
    np.exp(exp_a, out=exp_a)
    np.exp(exp_b, out=exp_b)
    c = np.dot(exp_a, exp_b)
    np.log(c, out=c)
    c += max_a + max_b
    return c

使用iPython的魔术%timeit函数将该方法与上面发布的方法(logdot_old)进行快速比较,结果如下:

In  [1] a = np.log(np.random.rand(1000,2000))

In  [2] b = np.log(np.random.rand(2000,1500))

In  [3] x = logdot(a, b)

In  [4] y = logdot_old(a, b) # this takes a while

In  [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False

In  [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop

In  [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop

很明显,拉尔曼斯的方法使我的意志消灭了!

解决方案

logsumexp通过评估方程式的右侧来工作

log(∑ exp[a]) = max(a) + log(∑ exp[a - max(a)])

即,它会在开始求和之前取出最大值,以防止exp中的溢出.在做矢量点积之前,可以应用相同的方法:

log(exp[a] ⋅ exp[b])
 = log(∑ exp[a] × exp[b])
 = log(∑ exp[a + b])
 = max(a + b) + log(∑ exp[a + b - max(a + b)])     { this is logsumexp(a + b) }

但是通过推导中的另一种方式,我们获得了

log(∑ exp[a] × exp[b])
 = max(a) + max(b) + log(∑ exp[a - max(a)] × exp[b - max(b)])
 = max(a) + max(b) + log(exp[a - max(a)] ⋅ exp[b - max(b)])

最终形式的内部有一个矢量点积.它也很容易扩展到矩阵乘法,因此我们得到了算法

def logdotexp(A, B):
    max_A = np.max(A)
    max_B = np.max(B)
    C = np.dot(np.exp(A - max_A), np.exp(B - max_B))
    np.log(C, out=C)
    C += max_A + max_B
    return C

这将创建两个A大小的临时对象和两个B大小的临时对象,但是每个临时对象可以通过以下方式消除

exp_A = A - max_A
np.exp(exp_A, out=exp_A)

,对于B同样. (如果可以通过该功能修改输入矩阵,则可以消除所有临时对象.)

I need to take the matrix product of two NumPy matrices (or other 2d arrays) containing log probabilities. The naive way np.log(np.dot(np.exp(a), np.exp(b))) is not preferred for obvious reasons.

Using

from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
    # broadcast b[:,n] over rows of a, sum columns
    res[:, n] = logsumexp(a + b[:, n].T, axis=1) 

works but runs about 100 times slower than np.log(np.dot(np.exp(a), np.exp(b)))

Using

logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T

or other combinations of tile and reshape also work but run even slower than the loop above due to the prohibitively large amounts of memory required for realistically sized input matrices.

I am currently considering writing a NumPy extension in C to compute this, but of course I'd rather avoid that. Is there an established way to do this, or does anybody know of a less memory intensive way of performing this computation?

EDIT: Thanks to larsmans for this solution (see below for derivation):

def logdot(a, b):
    max_a, max_b = np.max(a), np.max(b)
    exp_a, exp_b = a - max_a, b - max_b
    np.exp(exp_a, out=exp_a)
    np.exp(exp_b, out=exp_b)
    c = np.dot(exp_a, exp_b)
    np.log(c, out=c)
    c += max_a + max_b
    return c

A quick comparison of this method to the method posted above (logdot_old) using iPython's magic %timeit function yields the following:

In  [1] a = np.log(np.random.rand(1000,2000))

In  [2] b = np.log(np.random.rand(2000,1500))

In  [3] x = logdot(a, b)

In  [4] y = logdot_old(a, b) # this takes a while

In  [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False

In  [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop

In  [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop

Obviously larsmans' method obliterates mine!

解决方案

logsumexp works by evaluating the right-hand side of the equation

log(∑ exp[a]) = max(a) + log(∑ exp[a - max(a)])

I.e., it pulls out the max before starting to sum, to prevent overflow in exp. The same can be applied before doing vector dot products:

log(exp[a] ⋅ exp[b])
 = log(∑ exp[a] × exp[b])
 = log(∑ exp[a + b])
 = max(a + b) + log(∑ exp[a + b - max(a + b)])     { this is logsumexp(a + b) }

but by taking a different turn in the derivation, we obtain

log(∑ exp[a] × exp[b])
 = max(a) + max(b) + log(∑ exp[a - max(a)] × exp[b - max(b)])
 = max(a) + max(b) + log(exp[a - max(a)] ⋅ exp[b - max(b)])

The final form has a vector dot product in its innards. It also extends readily to matrix multiplication, so we get the algorithm

def logdotexp(A, B):
    max_A = np.max(A)
    max_B = np.max(B)
    C = np.dot(np.exp(A - max_A), np.exp(B - max_B))
    np.log(C, out=C)
    C += max_A + max_B
    return C

This creates two A-sized temporaries and two B-sized ones, but one of each can be eliminated by

exp_A = A - max_A
np.exp(exp_A, out=exp_A)

and similarly for B. (If the input matrices may be modified by the function, all the temporaries can be eliminated.)

这篇关于在numpy中乘以对数概率矩阵的数值稳定方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆