在numpy中乘以对数概率矩阵的数值稳定方法 [英] numerically stable way to multiply log probability matrices in numpy
问题描述
我需要获取两个包含对数概率的NumPy矩阵(或其他2d数组)的矩阵乘积.出于明显的原因,不建议使用天真的np.log(np.dot(np.exp(a), np.exp(b)))
.
使用
from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
# broadcast b[:,n] over rows of a, sum columns
res[:, n] = logsumexp(a + b[:, n].T, axis=1)
可以运行,但运行速度比np.log(np.dot(np.exp(a), np.exp(b)))
使用
logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T
或平铺和重塑的其他组合也可以工作,但运行速度比上面的循环还要慢,这是由于实际大小的输入矩阵所需的内存过大.
我目前正在考虑用C编写一个NumPy扩展来计算它,但是我当然希望避免这种情况.有没有确定的方法来执行此操作,或者有人知道执行这种计算的内存占用较少的方法吗?
感谢larsmans提供此解决方案(请参见下文以了解派生方法):
def logdot(a, b):
max_a, max_b = np.max(a), np.max(b)
exp_a, exp_b = a - max_a, b - max_b
np.exp(exp_a, out=exp_a)
np.exp(exp_b, out=exp_b)
c = np.dot(exp_a, exp_b)
np.log(c, out=c)
c += max_a + max_b
return c
使用iPython的魔术%timeit
函数将该方法与上面发布的方法(logdot_old
)进行快速比较,结果如下:
In [1] a = np.log(np.random.rand(1000,2000))
In [2] b = np.log(np.random.rand(2000,1500))
In [3] x = logdot(a, b)
In [4] y = logdot_old(a, b) # this takes a while
In [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False
In [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop
In [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop
很明显,拉尔曼斯的方法使我的意志消灭了!
logsumexp
通过评估方程式的右侧来工作
log(∑ exp[a]) = max(a) + log(∑ exp[a - max(a)])
即,它会在开始求和之前取出最大值,以防止exp
中的溢出.在做矢量点积之前,可以应用相同的方法:
log(exp[a] ⋅ exp[b])
= log(∑ exp[a] × exp[b])
= log(∑ exp[a + b])
= max(a + b) + log(∑ exp[a + b - max(a + b)]) { this is logsumexp(a + b) }
但是通过推导中的另一种方式,我们获得了
log(∑ exp[a] × exp[b])
= max(a) + max(b) + log(∑ exp[a - max(a)] × exp[b - max(b)])
= max(a) + max(b) + log(exp[a - max(a)] ⋅ exp[b - max(b)])
最终形式的内部有一个矢量点积.它也很容易扩展到矩阵乘法,因此我们得到了算法
def logdotexp(A, B):
max_A = np.max(A)
max_B = np.max(B)
C = np.dot(np.exp(A - max_A), np.exp(B - max_B))
np.log(C, out=C)
C += max_A + max_B
return C
这将创建两个A
大小的临时对象和两个B
大小的临时对象,但是每个临时对象可以通过以下方式消除
exp_A = A - max_A
np.exp(exp_A, out=exp_A)
,对于B
同样. (如果可以通过该功能修改输入矩阵,则可以消除所有临时对象.)
I need to take the matrix product of two NumPy matrices (or other 2d arrays) containing log probabilities. The naive way np.log(np.dot(np.exp(a), np.exp(b)))
is not preferred for obvious reasons.
Using
from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
# broadcast b[:,n] over rows of a, sum columns
res[:, n] = logsumexp(a + b[:, n].T, axis=1)
works but runs about 100 times slower than np.log(np.dot(np.exp(a), np.exp(b)))
Using
logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T
or other combinations of tile and reshape also work but run even slower than the loop above due to the prohibitively large amounts of memory required for realistically sized input matrices.
I am currently considering writing a NumPy extension in C to compute this, but of course I'd rather avoid that. Is there an established way to do this, or does anybody know of a less memory intensive way of performing this computation?
EDIT: Thanks to larsmans for this solution (see below for derivation):
def logdot(a, b):
max_a, max_b = np.max(a), np.max(b)
exp_a, exp_b = a - max_a, b - max_b
np.exp(exp_a, out=exp_a)
np.exp(exp_b, out=exp_b)
c = np.dot(exp_a, exp_b)
np.log(c, out=c)
c += max_a + max_b
return c
A quick comparison of this method to the method posted above (logdot_old
) using iPython's magic %timeit
function yields the following:
In [1] a = np.log(np.random.rand(1000,2000))
In [2] b = np.log(np.random.rand(2000,1500))
In [3] x = logdot(a, b)
In [4] y = logdot_old(a, b) # this takes a while
In [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False
In [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop
In [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop
Obviously larsmans' method obliterates mine!
logsumexp
works by evaluating the right-hand side of the equation
log(∑ exp[a]) = max(a) + log(∑ exp[a - max(a)])
I.e., it pulls out the max before starting to sum, to prevent overflow in exp
. The same can be applied before doing vector dot products:
log(exp[a] ⋅ exp[b])
= log(∑ exp[a] × exp[b])
= log(∑ exp[a + b])
= max(a + b) + log(∑ exp[a + b - max(a + b)]) { this is logsumexp(a + b) }
but by taking a different turn in the derivation, we obtain
log(∑ exp[a] × exp[b])
= max(a) + max(b) + log(∑ exp[a - max(a)] × exp[b - max(b)])
= max(a) + max(b) + log(exp[a - max(a)] ⋅ exp[b - max(b)])
The final form has a vector dot product in its innards. It also extends readily to matrix multiplication, so we get the algorithm
def logdotexp(A, B):
max_A = np.max(A)
max_B = np.max(B)
C = np.dot(np.exp(A - max_A), np.exp(B - max_B))
np.log(C, out=C)
C += max_A + max_B
return C
This creates two A
-sized temporaries and two B
-sized ones, but one of each can be eliminated by
exp_A = A - max_A
np.exp(exp_A, out=exp_A)
and similarly for B
. (If the input matrices may be modified by the function, all the temporaries can be eliminated.)
这篇关于在numpy中乘以对数概率矩阵的数值稳定方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!