提高高斯-赛德尔(Jacobi)解算器的Numpy速度 [英] Improving Numpy speed for Gauss-Seidel (Jacobi) Solver

查看:97
本文介绍了提高高斯-赛德尔(Jacobi)解算器的Numpy速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是对最近发布的关于MATLAB两倍于像Numpy一样快.

我目前在MATLAB和Numpy中都实现了一个高斯-塞德尔求解器,该求解器作用于2D轴对称域(圆柱坐标).该代码最初是用MATLAB编写的,然后转移到Python. Matlab代码运行约20 s,而Numpy代码运行约30 s.我想使用Numpy,但是,由于此代码是较大程序的一部分,因此几乎两倍的模拟时间是一个重大缺点.

I currently have a Gauss-Seidel solver implemented in both MATLAB and Numpy which acts on a 2D axisymmetric domain (cylindrical coordinates). The code was originally written in MATLAB and then transferred to Python. The Matlab code runs in ~20 s whereas the Numpy codes takes ~30 s. I would like to use Numpy, however, since this code is part of a larger program, the almost twice as long simulation time is a significant drawback.

该算法仅在矩形网格上(在圆柱坐标中)求解离散的拉普拉斯方程.当网格更新之间的最大差异小于指示的公差时,该操作结束.

The algorithm simply solves the discretized Laplace equation on a rectangular mesh (in cylindrical coordinates). It finishes when the maximum difference between updates on the mesh is less than the indicated tolerance.

Numpy中的代码是:

The code in Numpy is:

import numpy as np
import time

T = np.transpose

# geometry
length = 0.008
width = 0.002

# mesh
nz = 256
nr = 64

# step sizes
dz = length/nz
dr = width/nr

# node position matrices
r = np.tile(np.linspace(0,width,nr+1), (nz+1, 1)).T
ri = r/dr

# equation coefficients
cr = dz**2 / (2*(dr**2 + dz**2))
cz = dr**2 / (2*(dr**2 + dz**2))

# initial/boundary conditions
v = np.zeros((nr+1,nz+1))
v[:,0] = 1100
v[:,-1] = 0
v[31:,29:40] = 1000
v[19:,54:65] = -200

# convergence parameters
tol = 1e-4

# Gauss-Seidel solver
tic = time.time()
max_v_diff = 1;
while (max_v_diff > tol):

    v_old = v.copy()

    # left boundary updates
    v[0,1:nz] = cr*2*v[1,1:nz] + cz*(v[0,0:nz-1] + v[0,2:nz+2])
    # internal updates
    v[1:nr,1:nz] = cr*((1 - 1/(2*ri[1:nr,1:nz]))*v[0:nr-1,1:nz] + (1 + 1/(2*ri[1:nr,1:nz]))*v[2:nr+1,1:nz]) + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1])
    # right boundary updates
    v[nr,1:nz] = cr*2*v[nr-1,1:nz] + cz*(v[nr,0:nz-1] + v[nr,2:nz+1])
    # reapply grid potentials
    v[31:,29:40] = 1000
    v[19:,54:65] = -200

    # check for convergence
    v_diff = v - v_old
    max_v_diff = np.absolute(v_diff).max()

toc = time.time() - tic
print(toc)

这实际上不是我使用的完整算法.完整的算法使用连续的过松弛和棋盘迭代方案来提高速度并消除解算器的方向性,但为简单起见,我提供了此易于理解的版本.在完整版中,Numpy的速度缺点更为明显(在Numpy和MATLAB中,仿真时间分别为17s和9s).

This is actually not the full algorithm which I use. The full algorithm uses successive overrelaxation and a checkerboard iteration scheme to improve speed and remove solver directionality, but for purposes of simplicity I provided this easier to understand version. The speed drawbacks in Numpy are more pronounced for the full version (17s vs. 9s simulation times respectively in Numpy and MATLAB).

我尝试了上一个问题的解决方案,将v更改为列优先顺序数组,但性能没有提高.

I tried the solution from the previous question, changing v to a column-major order array, but there was no performance increase.

有什么建议吗?

供参考的Matlab代码为:

The Matlab code for reference is:

% geometry
length = 0.008;
width = 0.002;

% mesh
nz = 256;
nr = 64;

% step sizes
dz = length/nz;
dr = width/nr;

% node position matrices
r = repmat(linspace(0,width,nr+1)', 1, nz+1);
ri = r./dr;

% equation coefficients
cr = dz^2/(2*(dr^2+dz^2));
cz = dr^2/(2*(dr^2+dz^2));

% initial/boundary conditions
v = zeros(nr+1,nz+1);
v(1:nr+1,1) = 1100;
v(1:nr+1,nz+1) = 0;
v(32:nr+1,30:40) = 1000;
v(20:nr+1,55:65) = -200;

% convergence parameters
tol = 1e-4;
max_v_diff = 1;

% Gauss-Seidel Solver
tic
while (max_v_diff > tol)
    v_old = v;

    % left boundary updates
    v(1,2:nz) = cr.*2.*v(2,2:nz) + cz.*( v(1,1:nz-1) + v(1,3:nz+1) );
    % internal updates
    v(2:nr,2:nz) = cr.*( (1 - 1./(2.*ri(2:nr,2:nz))).*v(1:nr-1,2:nz) + (1 + 1./(2.*ri(2:nr,2:nz))).*v(3:nr+1,2:nz) ) + cz.*( v(2:nr,1:nz-1) + v(2:nr,3:nz+1) );    
    % right boundary updates
    v(nr+1,2:nz) = cr.*2.*v(nr,2:nz) + cz.*( v(nr+1,1:nz-1) + v(nr+1,3:nz+1) );
    % reapply grid potentials
    v(32:nr+1,30:40) = 1000;
    v(20:nr+1,55:65) = -200;

    % check for convergence
    max_v_diff = max(max(abs(v - v_old)));

end
toc

推荐答案

通过执行以下过程,我已经能够将笔记本电脑的运行时间从66秒减少到21秒:

I've been able to reduce the running time in my laptop from 66 to 21 seconds by following this process:

  1. 找到瓶颈.我使用了 line_profiler 来分析了代码/06/timing-and-profiling.html"rel =" nofollow> IPython 控制台可找到花费最多时间的行.事实证明,超过80%的时间都花在了内部更新"这一行中.

  1. Find the bottleneck. I profiled the code using line_profiler from the IPython console to find the lines that took most time. It turned out that over 80% of the time was spent in the line that does "internal updates".

选择一种对其进行优化的方法.有几种工具可以加速numpy中的代码(Cython,numexpr,weave ...).特别地,scipy.weave.blitz非常适合将numpy表达式(例如令人反感的行)编译为快速代码.从理论上讲,该行可以包装在"..."内并作为weave.blitz("...")执行,但计算中将使用要更新的数组,如

Choose a way to optimise it. There are several tools to speed code up in numpy (Cython, numexpr, weave...). In particular, scipy.weave.blitz is well suited to compile numpy expressions, like the offending line, into fast code. In theory, that line could be wrapped inside "..." and executed as weave.blitz("...") but the array that's being updated is used in the computation, so as stated by point #4 in the docs a temporary array must be used to keep the same result:

expr = "temp = cr*((1 - 1/(2*ri[1:nr,1:nz]))*v[0:nr-1,1:nz] + (1 + 1/(2*ri[1:nr,1:nz]))*v[2:nr+1,1:nz]) + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1]); v[1:nr,1:nz] = temp"
temp = np.empty((nr-1, nz-1))
...
while ...
    # internal updates
    weave.blitz(expr)

  • 在检查结果是否正确之后,可以使用weave.blitz(expr, check_size=0)禁用运行时检查.该代码现在可以在34秒内运行.

  • After checking that the results are correct, runtime checks are disabled by using weave.blitz(expr, check_size=0). The code now runs in 34 seconds.

    以Jaime的工作为基础,预先计算表达式中的常数因子AB.该代码将在21秒内运行(更改最少,但现在需要编译器).

    Building up on Jaime's work, precompute the constant factors A and B in the expression. The code runs in 21 seconds (with minimal changes but it now needs a compiler).

    这是代码的核心:

    from scipy import weave
    
    # [...] Set up code till "# Gauss-Seidel solver"
    
    tic = time.time()
    max_v_diff = 1;
    A = cr * (1 - 1/(2*ri[1:nr,1:nz]))
    B = cr * (1 + 1/(2*ri[1:nr,1:nz]))
    expr = "temp = A*v[0:nr-1,1:nz] + B*v[2:nr+1,1:nz] + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1]); v[1:nr,1:nz] = temp"
    temp = np.empty((nr-1, nz-1))
    while (max_v_diff > tol):
        v_old = v.copy()
        # left boundary updates
        v[0,1:nz] = cr*2*v[1,1:nz] + cz*(v[0,0:nz-1] + v[0,2:nz+2])
        # internal updates
        weave.blitz(expr, check_size=0)
        # right boundary updates
        v[nr,1:nz] = cr*2*v[nr-1,1:nz] + cz*(v[nr,0:nz-1] + v[nr,2:nz+1])
        # reapply grid potentials
        v[31:,29:40] = 1000
        v[19:,54:65] = -200
        # check for convergence
        v_diff = v - v_old
        max_v_diff = np.absolute(v_diff).max()
    toc = time.time() - tic
    

    这篇关于提高高斯-赛德尔(Jacobi)解算器的Numpy速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆