如何在SymPy中加速慢矩阵乘法? [英] How to accelerate slow matrix multiplication in SymPy?

查看:87
本文介绍了如何在SymPy中加速慢矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个工具来用SymPy解决特定的递归方程,结果发现涉及矩阵乘法的步骤之一花费的时间特别长.例如,如果我在iPython控制台中尝试以下操作,

In [1]: from sympy import *

In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)

In [3]: A[0:7, 0:7]
Out[3]: 
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])

In [4]: A * A
...

我从来没有等待足够长的时间来完成它.

我知道符号操作比数值计算要慢得多,但这似乎很荒谬.使用Octave或其他线性代数软件包可以在不到一秒钟的时间内执行此计算.

有没有人有使用SymPy的矩阵类进行行和行的经验?列〜1000,带有Rational条目?

解决方案

任意精度是必须的,速度是必须的

有一些用于此类目的的pythonic类,您可能会对您的任意精度计算感兴趣

  • decimal.Decimal()

  • fractions.Fraction( numerator = 0, denominator = 1 )

对于大型天文学,DEP模拟或其他领域而言,这些对于精确的计算是必不可少的,在这些领域中,精度不能随着长时间的时间/事件/递归以及对标准数的类似威胁而随着计算策略的发展而降低-表示形式.

我个人使用decimal.Decimal()进行工作(在加密序列随机性分析中精度高达5.000.000位),但我并不专注于在内部" numpy 矩阵中添加"". /p>

numpy在其matrix-dataStructures(dtype = decimal.Decimal语法等)中可以包含这些类型,但是需要进行测试,以验证其处理这些可爱的pythonic类实例后的向量化函数操作和整体速度.

性能

作为对标准的numpy速度的初步观察("密集"听起来很有趣)2x2 decimal.Decimal -s:

>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
        Decimal('5.502209507697009702374335655')],
       [Decimal('1.524157875323883675019051999'),
        Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________

同时在dtype = numpy.float96上使用了

>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046,  0.74206772],
       [ 0.10081096,  0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________

对于500 x 500完全填充的 dtype = fractions.Fraction

>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
        Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
        ..,
       [Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
        Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96

对于500 x 500完全填充的密集 dtype = decimal.Decimal

>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
        Decimal('0.7901234567901234567901234568'),
        Decimal('0.1975308641975308641975308642'), Decimal('36')],
       [Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
        Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
        Decimal('0.1111111111111111111111111111')],
       [Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
        Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
        Decimal('0.25'), Decimal('0.0625')],
       ...,
       [Decimal('0'), Decimal('5.444444444444444444444444443'),
        Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
       [Decimal('1'), Decimal('7.111111111111111111111111113'),
        Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
       [Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
        Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96

对于1000 x 1000三对角矩阵的期望会低于50毫秒吗?

由于存在3,000个非零(稀疏表示)元素,而不是上面测试的完全填充的500x500矩阵中的250,000个单元,因此使用这些pythonic类进行任意精度计算会有巨大的性能飞跃.一旦引擎可能在MUL/DIV操作上使用numerator/denominator构造优于Decimal的优势,分数就更大了,但是在实际情况下,您的计算方法正在使用时,应在体内对确切的性能范围进行测试.

SparseMatrix&的符号语法在1000x1000三对角线上进行测试

@tmyklebu提出的针对1000x1000 SparseMatrix的真实测试由于安装方面的麻烦而花了更长的时间来阐述,但是可能会为您提供一些对现实世界中实施项目的进一步了解:

>>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Decimal()

>>> for i in range( 1000 ):                                  # GEN to have F & D hold
...     for j in range( 1000 ):                              #     SAME values,
...         if i-j in [-1,0,1]:                              # but DIFF representations
...            num = int( 100 * numpy.random.random() )      #     
...            den = int( 100 * numpy.random.random() ) + 1  # + 1 to avoid DIV!0
...            F[i,j] = fractions.Fraction( numerator = num, denominator = den )
...            D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )

# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts

>>> aClk.start();VOID=F*F;aClk.stop()
770353L                                      # notice the 1st eval took  TRIPLE LONGER
205585L                                      # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()


>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()

>>> F[0:4,0:4]
Matrix([
[ 1/52,  6/23,     0,     0],
[42/29, 29/12,     1,     0],
[    0, 57/88, 39/62, 13/57],
[    0,     0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391,                 0,                 0],
[  1.44827586206897,  2.41666666666667,               1.0,                 0],
[                 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[                 0,                 0, 0.409638554216867, 0.273684210526316]])

I was writing a tool to solve specific recurrence equations with SymPy, and found that one of the steps that involved a matrix multiplication was taking an extraordinarily long time. For instance, if I try the following in the iPython console,

In [1]: from sympy import *

In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)

In [3]: A[0:7, 0:7]
Out[3]: 
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])

In [4]: A * A
...

I've never waited long enough for it to finish.

I appreciate that symbolic manipulation considerably slower than numerical evaluation, but this seems pretty ridiculous. Using Octave or other linear algebra packages can perform this calculation in a fraction of a second.

Does anyone have experience using SymPy's matrix class for rows & columns ~ 1000, with Rational entries?

解决方案

Arbitrary-precision as a must, speed as a nice-to-have

There are a few pythonic classes for such purpose, that may interest your arbitrary precision calculi

  • decimal.Decimal()

  • fractions.Fraction( numerator = 0, denominator = 1 )

These are indispensable for precise calculations, be it for large-scale astronomy, DEP-simulations or other fields, where precision must not degrade alongside the computation strategy progress over long scales of time / events / recursion and similar threats to standard number-representations.

Personally I worked with decimal.Decimal() ( for up to 5.000.000 digits precision in crypto-sequence randomness analyses ) but was not focused on having 'em "inside" numpy matrix.

Numpy can bear these types in it's matrix-dataStructures ( dtype = decimal.Decimal syntax et al), however testing is needed, to verify it's vectorised functions operations and the overall speed once handling these lovely pythonic classes' instances.

Performance

As an initial observation on numpy speed on a standard ("dense" sounds funny for this scale ) 2x2 decimal.Decimal-s:

>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
        Decimal('5.502209507697009702374335655')],
       [Decimal('1.524157875323883675019051999'),
        Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________

While the same on dtype = numpy.float96 took

>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046,  0.74206772],
       [ 0.10081096,  0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________

For a 500 x 500 fully-populated dtype = fractions.Fraction

>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
        Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
        ..,
       [Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
        Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96

For a 500 x 500 fully-populated dense dtype = decimal.Decimal

>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
        Decimal('0.7901234567901234567901234568'),
        Decimal('0.1975308641975308641975308642'), Decimal('36')],
       [Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
        Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
        Decimal('0.1111111111111111111111111111')],
       [Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
        Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
        Decimal('0.25'), Decimal('0.0625')],
       ...,
       [Decimal('0'), Decimal('5.444444444444444444444444443'),
        Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
       [Decimal('1'), Decimal('7.111111111111111111111111113'),
        Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
       [Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
        Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96

May expectations for a Tri-diagonal matrix 1000x1000 go under 50 msec?

as there are 3,000 non-zero ( sparse-representation ) elements, as opposed to 250,000 cells in fully-populated 500x500 matrices tested above, there is a huge performance jump for using these pythonic classes for your arbitrary precision computations. Fraction has a more space once the engine may use an advantage of numerator/denominator construction for MUL / DIV operations over Decimal, however exact performance envelopes shall be tested in-vivo, on real cases your computing approach is using.

Sympy syntax for SparseMatrix & tests on 1000x1000 Tri-Diagonals

a real test on 1000x1000 SparseMatrix, as proposed by @tmyklebu, took a bit longer to get elaborated due to installaton troubles, however may give you some further insight into real-world implementation Projects:

>>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Decimal()

>>> for i in range( 1000 ):                                  # GEN to have F & D hold
...     for j in range( 1000 ):                              #     SAME values,
...         if i-j in [-1,0,1]:                              # but DIFF representations
...            num = int( 100 * numpy.random.random() )      #     
...            den = int( 100 * numpy.random.random() ) + 1  # + 1 to avoid DIV!0
...            F[i,j] = fractions.Fraction( numerator = num, denominator = den )
...            D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )

# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts

>>> aClk.start();VOID=F*F;aClk.stop()
770353L                                      # notice the 1st eval took  TRIPLE LONGER
205585L                                      # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()


>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()

>>> F[0:4,0:4]
Matrix([
[ 1/52,  6/23,     0,     0],
[42/29, 29/12,     1,     0],
[    0, 57/88, 39/62, 13/57],
[    0,     0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391,                 0,                 0],
[  1.44827586206897,  2.41666666666667,               1.0,                 0],
[                 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[                 0,                 0, 0.409638554216867, 0.273684210526316]])

这篇关于如何在SymPy中加速慢矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆