如何在SymPy中加速慢矩阵乘法? [英] How to accelerate slow matrix multiplication in SymPy?
问题描述
我正在写一个工具来用SymPy解决特定的递归方程,结果发现涉及矩阵乘法的步骤之一花费的时间特别长.例如,如果我在iPython控制台中尝试以下操作,
In [1]: from sympy import *
In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)
In [3]: A[0:7, 0:7]
Out[3]:
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])
In [4]: A * A
...
我从来没有等待足够长的时间来完成它.
我知道符号操作比数值计算要慢得多,但这似乎很荒谬.使用Octave或其他线性代数软件包可以在不到一秒钟的时间内执行此计算.
有没有人有使用SymPy的矩阵类进行行和行的经验?列〜1000,带有Rational条目?
任意精度是必须的,速度是必须的
有一些用于此类目的的pythonic类,您可能会对您的任意精度计算感兴趣
-
decimal.Decimal()
-
fractions.Fraction( numerator = 0, denominator = 1 )
对于大型天文学,DEP模拟或其他领域而言,这些对于精确的计算是必不可少的,在这些领域中,精度不能随着长时间的时间/事件/递归以及对标准数的类似威胁而随着计算策略的发展而降低-表示形式.
我个人使用 numpy在其matrix-dataStructures( 作为对标准的numpy速度的初步观察("密集"听起来很有趣)2x2 同时在 对于500 x 500完全填充的 对于500 x 500完全填充的密集
由于存在3,000个非零(稀疏表示)元素,而不是上面测试的完全填充的500x500矩阵中的250,000个单元,因此使用这些pythonic类进行任意精度计算会有巨大的性能飞跃.一旦引擎可能在MUL/DIV操作上使用 @tmyklebu提出的针对1000x1000 I was writing a tool to solve specific recurrence equations with SymPy, and found that one of the steps that involved a matrix multiplication was taking an extraordinarily long time. For instance, if I try the following in the iPython console, I've never waited long enough for it to finish. I appreciate that symbolic manipulation considerably slower than numerical evaluation, but this seems pretty ridiculous. Using Octave or other linear algebra packages can perform this calculation in a fraction of a second. Does anyone have experience using SymPy's matrix class for rows & columns ~ 1000, with Rational entries? There are a few pythonic classes for such purpose, that may interest your arbitrary precision calculi These are indispensable for precise calculations, be it for large-scale astronomy, DEP-simulations or other fields, where precision must not degrade alongside the computation strategy progress over long scales of time / events / recursion and similar threats to standard number-representations. Personally I worked with Numpy can bear these types in it's matrix-dataStructures ( As an initial observation on numpy speed on a standard ("dense" sounds funny for this scale ) 2x2 While the same on For a 500 x 500 fully-populated For a 500 x 500 fully-populated dense
as there are 3,000 non-zero ( sparse-representation ) elements, as opposed to 250,000 cells in fully-populated 500x500 matrices tested above, there is a huge performance jump for using these pythonic classes for your arbitrary precision computations. Fraction has a more space once the engine may use an advantage of a real test on 1000x1000
这篇关于如何在SymPy中加速慢矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!decimal.Decimal()
进行工作(在加密序列随机性分析中精度高达5.000.000位),但我并不专注于在内部" numpy
矩阵中添加"". /p>
dtype = decimal.Decimal
语法等)中可以包含这些类型,但是需要进行测试,以验证其处理这些可爱的pythonic类实例后的向量化函数操作和整体速度.
性能
decimal.Decimal
-s:>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
Decimal('5.502209507697009702374335655')],
[Decimal('1.524157875323883675019051999'),
Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________
dtype = numpy.float96
上使用了>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046, 0.74206772],
[ 0.10081096, 0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________
dtype = fractions.Fraction
>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
..,
[Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96
dtype = decimal.Decimal
>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
Decimal('0.7901234567901234567901234568'),
Decimal('0.1975308641975308641975308642'), Decimal('36')],
[Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
Decimal('0.1111111111111111111111111111')],
[Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
Decimal('0.25'), Decimal('0.0625')],
...,
[Decimal('0'), Decimal('5.444444444444444444444444443'),
Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
[Decimal('1'), Decimal('7.111111111111111111111111113'),
Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
[Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96
对于1000 x 1000三对角矩阵的期望会低于50毫秒吗?
numerator
/denominator
构造优于Decimal的优势,分数就更大了,但是在实际情况下,您的计算方法正在使用时,应在体内对确切的性能范围进行测试.
SparseMatrix
&的符号语法在1000x1000三对角线上进行测试SparseMatrix
的真实测试由于安装方面的麻烦而花了更长的时间来阐述,但是可能会为您提供一些对现实世界中实施项目的进一步了解: >>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Decimal()
>>> for i in range( 1000 ): # GEN to have F & D hold
... for j in range( 1000 ): # SAME values,
... if i-j in [-1,0,1]: # but DIFF representations
... num = int( 100 * numpy.random.random() ) #
... den = int( 100 * numpy.random.random() ) + 1 # + 1 to avoid DIV!0
... F[i,j] = fractions.Fraction( numerator = num, denominator = den )
... D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )
# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts
>>> aClk.start();VOID=F*F;aClk.stop()
770353L # notice the 1st eval took TRIPLE LONGER
205585L # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()
>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()
>>> F[0:4,0:4]
Matrix([
[ 1/52, 6/23, 0, 0],
[42/29, 29/12, 1, 0],
[ 0, 57/88, 39/62, 13/57],
[ 0, 0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391, 0, 0],
[ 1.44827586206897, 2.41666666666667, 1.0, 0],
[ 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[ 0, 0, 0.409638554216867, 0.273684210526316]])
In [1]: from sympy import *
In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)
In [3]: A[0:7, 0:7]
Out[3]:
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])
In [4]: A * A
...
Arbitrary-precision as a must, speed as a nice-to-have
decimal.Decimal()
fractions.Fraction( numerator = 0, denominator = 1 )
decimal.Decimal()
( for up to 5.000.000 digits precision in crypto-sequence randomness analyses ) but was not focused on having 'em "inside" numpy
matrix.dtype = decimal.Decimal
syntax et al), however testing is needed, to verify it's vectorised functions operations and the overall speed once handling these lovely pythonic classes' instances.Performance
decimal.Decimal
-s:>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
Decimal('5.502209507697009702374335655')],
[Decimal('1.524157875323883675019051999'),
Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________
dtype = numpy.float96
took>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046, 0.74206772],
[ 0.10081096, 0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________
dtype = fractions.Fraction
>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
..,
[Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96
dtype = decimal.Decimal
>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
Decimal('0.7901234567901234567901234568'),
Decimal('0.1975308641975308641975308642'), Decimal('36')],
[Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
Decimal('0.1111111111111111111111111111')],
[Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
Decimal('0.25'), Decimal('0.0625')],
...,
[Decimal('0'), Decimal('5.444444444444444444444444443'),
Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
[Decimal('1'), Decimal('7.111111111111111111111111113'),
Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
[Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96
May expectations for a Tri-diagonal matrix 1000x1000 go under 50 msec?
numerator
/denominator
construction for MUL / DIV operations over Decimal, however exact performance envelopes shall be tested in-vivo, on real cases your computing approach is using.Sympy syntax for
SparseMatrix
& tests on 1000x1000 Tri-DiagonalsSparseMatrix
, as proposed by @tmyklebu, took a bit longer to get elaborated due to installaton troubles, however may give you some further insight into real-world implementation Projects:>>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} ) # .Decimal()
>>> for i in range( 1000 ): # GEN to have F & D hold
... for j in range( 1000 ): # SAME values,
... if i-j in [-1,0,1]: # but DIFF representations
... num = int( 100 * numpy.random.random() ) #
... den = int( 100 * numpy.random.random() ) + 1 # + 1 to avoid DIV!0
... F[i,j] = fractions.Fraction( numerator = num, denominator = den )
... D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )
# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts
>>> aClk.start();VOID=F*F;aClk.stop()
770353L # notice the 1st eval took TRIPLE LONGER
205585L # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()
>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()
>>> F[0:4,0:4]
Matrix([
[ 1/52, 6/23, 0, 0],
[42/29, 29/12, 1, 0],
[ 0, 57/88, 39/62, 13/57],
[ 0, 0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391, 0, 0],
[ 1.44827586206897, 2.41666666666667, 1.0, 0],
[ 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[ 0, 0, 0.409638554216867, 0.273684210526316]])