矩阵求逆(3,3)python-硬编码与numpy.linalg.inv [英] Matrix inversion (3,3) python - hard coded vs numpy.linalg.inv

查看:107
本文介绍了矩阵求逆(3,3)python-硬编码与numpy.linalg.inv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于大量矩阵,我需要计算定义为的距离度量:

For a large number of matrices i need to compute a distance measure defined as:

尽管我确实强烈建议不要使用矩阵求逆,但是我没有找到解决之道.因此,由于所有矩阵的大小都为(3,3),因此我尝试通过对矩阵求逆进行硬编码来提高性能.

Although I do know that matrix inversion is strongly discouraged, I do not see a way around it. Therefore I tried to improve the performance by hard coding the matrix inversion, as all matrices are of size (3,3).

我希望它至少会有所改善,但事实并非如此.

I expected it to be at least a tiny improvement, yet it is not.

为什么numpy.linalg.inv比这种硬编码的矩阵求逆更快/更高效?

Why is numpy.linalg.inv faster/more performant than this hard-coded matrix inversion?

此外,我必须采取哪些替代措施来改善这一瓶颈?

Further, what alternatives do I have to improve this bottleneck?

def inversion(m):    
    m1, m2, m3, m4, m5, m6, m7, m8, m9 = m.flatten()
    determinant = m1*m5*m9 + m4*m8*m3 + m7*m2*m6 - m1*m6*m8 - m3*m5*m7 - m2*m4*m9  
    return np.array([[m5*m9-m6*m8, m3*m8-m2*m9, m2*m6-m3*m5],
                     [m6*m7-m4*m9, m1*m9-m3*m7, m3*m4-m1*m6],
                     [m4*m8-m5*m7, m2*m7-m1*m8, m1*m5-m2*m4]])/determinant

对于与3 * 3随机矩阵进行时序比较:

For a timing comparison with a random 3*3 matrix:

%timeit np.linalg.inv(a)

100000个循环,最好为3:每个循环12.5 µs

100000 loops, best of 3: 12.5 µs per loop

%timeit inversion(a)

100000个循环,每个循环最好3:13.9 µs

100000 loops, best of 3: 13.9 µs per loop

与此密切相关但完全没有重复的是该帖子进行代码审查,它解释了背景和整个功能.

Closely related yet not at all a duplicate is this post on code-review, which explains the background and the whole function.

正如@Divakar在评论中建议的那样,m.ravel()而不是m.flatten()稍微改善了反转功能,因此现在可以进行时序比较:

As @Divakar suggested in the comment, m.ravel() instead of m.flatten() is improving the inversion a little so that the timing comparison now yields:

numpy-100000个循环,最好为3:每个循环12.6 µs

numpy - 100000 loops, best of 3: 12.6 µs per loop

硬编码-100000个循环,最佳3:每个循环12.8 µs

hard coded - 100000 loops, best of 3: 12.8 µs per loop

尽管差距正在缩小,但硬编码的速度却要慢一些.怎么这样?

Although the gap is closing, the hard coded one is yet slower. How so?

推荐答案

最简单的优化方法是保存9个乘法和3个减法

Here is a humble optimisation, saving 9 multiplications and 3 subtractions

def inversion(m):    
    m1, m2, m3, m4, m5, m6, m7, m8, m9 = m.ravel()
    inv = np.array([[m5*m9-m6*m8, m3*m8-m2*m9, m2*m6-m3*m5],
                    [m6*m7-m4*m9, m1*m9-m3*m7, m3*m4-m1*m6],
                    [m4*m8-m5*m7, m2*m7-m1*m8, m1*m5-m2*m4]])
    return inv / np.dot(inv[0], m[:, 0])

您可以一次完成整个跟踪操作,从而挤出更多的操作(如果我计数正确,则可以再进行24次乘法运算):

You can squeeze out a few more ops (another 24 multiplications if I'm counting correctly) by doing the entire trace in one go:

def det(m):
   m1, m2, m3, m4, m5, m6, m7, m8, m9 = m.ravel()
   return np.dot(m[:, 0], [m5*m9-m6*m8, m3*m8-m2*m9, m2*m6-m3*m5])
   # or try m1*(m5*m9-m6*m8) + m4*(m3*m8-m2*m9) + m7*(m2*m6-m3*m5)
   # probably the fastest would be to inline the two calls to det
   # I'm not doing it here because of readability but you should try it

def dist(m, n):
   m1, m2, m3, m4, m5, m6, m7, m8, m9 = m.ravel()
   n1, n2, n3, n4, n5, n6, n7, n8, n9 = n.ravel()
   return 0.5 * np.dot(
       m.ravel()/det(m) + n.ravel()/det(n),
       [m5*n9-m6*n8, m6*n7-m4*n9, m4*n8-m5*n7, n3*m8-n2*m9, n1*m9-n3*m7,
        n2*m7-n1*m8, m2*n6-m3*n5, m3*n4-m1*n6, m1*n5-m2*n4])

好,这是内联版本:

import numpy as np
from timeit import timeit

def dist(m, n):
   m1, m2, m3, m4, m5, m6, m7, m8, m9 = m.ravel()
   n1, n2, n3, n4, n5, n6, n7, n8, n9 = n.ravel()
   return 0.5 * np.dot(
       m.ravel()/(m1*(m5*m9-m6*m8) + m4*(m3*m8-m2*m9) + m7*(m2*m6-m3*m5))
       + n.ravel()/(n1*(n5*n9-n6*n8) + n4*(n3*n8-n2*n9) + n7*(n2*n6-n3*n5)),
       [m5*n9-m6*n8, m6*n7-m4*n9, m4*n8-m5*n7, n3*m8-n2*m9, n1*m9-n3*m7,
        n2*m7-n1*m8, m2*n6-m3*n5, m3*n4-m1*n6, m1*n5-m2*n4])

def dist_np(m, n):
    return 0.5 * np.diag(np.linalg.inv(m)@n + np.linalg.inv(n)@m).sum()

for i in range(3):
    A, B = np.random.random((2,3,3))
    print(dist(A, B), dist_np(A, B))
    print('pp     ', timeit('f(A,B)', number=10000, globals={'f':dist, 'A':A, 'B':B}))
    print('numpy  ', timeit('f(A,B)', number=10000, globals={'f':dist_np, 'A':A, 'B':B}))

打印:

2.20109953156 2.20109953156
pp      0.13215381593909115
numpy   0.4334693900309503
7.50799877993 7.50799877993
pp      0.13934064202476293
numpy   0.32861811900511384
-0.780284449609 -0.780284449609
pp      0.1258618349675089
numpy   0.3110764700686559

请注意,您可以使用该函数的矢量化版本通过批处理进行另一笔大幅节省.该测试将计算两批100个矩阵之间的所有10,000对成对距离:

Note that you can make another substantial saving by batch-processing using a vectorised version of the function. The test computes all 10,000 pairwise distances between two batches of 100 matrices:

def dist(m, n):
    m = np.moveaxis(np.reshape(m, m.shape[:-2] + (-1,)), -1, 0)
    n = np.moveaxis(np.reshape(n, n.shape[:-2] + (-1,)), -1, 0)
    m1, m2, m3, m4, m5, m6, m7, m8, m9 = m
    n1, n2, n3, n4, n5, n6, n7, n8, n9 = n
    return 0.5 * np.einsum("i...,i...->...",
        m/(m1*(m5*m9-m6*m8) + m4*(m3*m8-m2*m9) + m7*(m2*m6-m3*m5))
        + n/(n1*(n5*n9-n6*n8) + n4*(n3*n8-n2*n9) + n7*(n2*n6-n3*n5)),
        [m5*n9-m6*n8, m6*n7-m4*n9, m4*n8-m5*n7, n3*m8-n2*m9, n1*m9-n3*m7,
         n2*m7-n1*m8, m2*n6-m3*n5, m3*n4-m1*n6, m1*n5-m2*n4])

def dist_np(m, n):
    return 0.5 * (np.linalg.inv(m)@n + np.linalg.inv(n)@m)[..., np.arange(3), np.arange(3)].sum(axis=-1)

for i in range(3):
    A = np.random.random((100,1,3,3))
    B = np.random.random((1,100,3,3))
    print(np.allclose(dist(A, B), dist_np(A, B)))
    print('pp     ', timeit('f(A,B)', number=100, globals={'f':dist, 'A':A, 'B':B}))
    print('numpy  ', timeit('f(A,B)', number=100, globals={'f':dist_np, 'A':A, 'B':B}))

打印:

True
pp      0.14652886800467968
numpy   1.5294789629988372
True
pp      0.1482033939100802
numpy   1.6455406049499288
True
pp      0.1279512889450416
numpy   1.370200254023075

这篇关于矩阵求逆(3,3)python-硬编码与numpy.linalg.inv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆