高效地构建FEM/FVM矩阵 [英] Efficiently construct FEM/FVM matrix

查看:62
本文介绍了高效地构建FEM/FVM矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是FEM/FVM方程系统的典型用例,因此可能引起广泛关注.从三角形网格àla

This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la

我想创建一个scipy.sparse.csr_matrix.矩阵行/列表示网格节点处的值.矩阵在主对角线上以及两个节点通过一条边连接的位置上都有条目.

I would like to create a scipy.sparse.csr_matrix. The matrix rows/columns represent values at the nodes of the mesh. The matrix has entries on the main diagonal and wherever two nodes are connected by an edge.

这是一个MWE,它首先建立一个node-> edge-> cells关系,然后建立矩阵:

Here's an MWE that first builds a node->edge->cells relationship and then builds the matrix:

import numpy
import meshzoo
from scipy import sparse

nx = 1600
ny = 1000
verts, cells = meshzoo.rectangle(0.0, 1.61, 0.0, 1.0, nx, ny)

n = len(verts)

nds = cells.T
nodes_edge_cells = numpy.stack([nds[[1, 2]], nds[[2, 0]],nds[[0, 1]]], axis=1)

# assign values to each edge (per cell)
alpha = numpy.random.rand(3, len(cells))
vals = numpy.array([
    [alpha**2, -alpha],
    [-alpha, alpha**2],
    ])

# Build I, J, V entries for COO matrix
I = []
J = []
V = []
#
V.append(vals[0][0])
V.append(vals[0][1])
V.append(vals[1][0])
V.append(vals[1][1])
#
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[1])
I.append(nodes_edge_cells[1])
#
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
# Create suitable data for coo_matrix
I = numpy.concatenate(I).flat
J = numpy.concatenate(J).flat
V = numpy.concatenate(V).flat

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()

使用

python -m cProfile -o profile.prof main.py
snakeviz profile.prof

一个人可以创建和查看以上内容的个人资料:

one can create and view a profile of the above:

方法tocsr()在此处占据了运行时的大部分份额,但是在构建alpha更复杂时也是如此.因此,我正在寻找加快速度的方法.

The method tocsr() takes the lion share of the runtime here, but this is also true when building alpha is more complex. Consequently, I'm looking for ways to speed this up.

我已经找到的东西:

  • 由于数据的结构,可以预先汇总矩阵对角线上的值,即

  • Due to the structure of the data, the values on the diagonal of the matrix can be summed up in advance, i.e.,

V.append(vals[0, 0, 0] + vals[1, 1, 2])
I.append(nodes_edge_cells[0, 0])  # == nodes_edge_cells[1, 2]
J.append(nodes_edge_cells[0, 0])  # == nodes_edge_cells[1, 2]

这会使IJV变短,从而加快了tocsr的速度.

This makes I, J, V shorter and thus speeds up tocsr.

现在,边缘是每个单元格".我可以使用numpy.unique识别彼此相等的边缘,从而有效地节省了大约IJV的一半.但是,我发现这也需要一些时间. (不足为奇).

Right now, edges are "per cell". I could identify equal edges with each other using numpy.unique, effectively saving about half of I, J, V. However, I found that this too takes some time. (Not surprising.)

另一个我想到的是,如果存在类似于csr_matrix的数据结构,其中主对角线是,则可以用简单的numpy.add.at替换对角线VIJ.分开存放.我知道这存在于其他一些软件包中,但无法在scipy中找到.正确吗?

One other thought that I had was that that I could replace the diagonal V, I, J by a simple numpy.add.at if there was a csr_matrix-like data structure where the main diagonal is kept separately. I know that this exists in some other software packages, but couldn't find it in scipy. Correct?

也许有一种明智的方法可以直接构建CSR?

Perhaps there's a sensible way to construct CSR directly?

推荐答案

最后,结果证明这是COO和CSR sum_duplicates之间的差异(就像怀疑的@hpaulj一样).感谢参与其中的每个人(尤其是@ paul-panzer)的努力,公关正在使tocsr的速度大大提高.

So, in the end this turned out to be the difference between COO's and CSR's sum_duplicates (just like @hpaulj suspected). Thanks to the efforts of everyone involved here (particularly @paul-panzer), a PR is underway to give tocsr a tremendous speedup.

SciPy的tocsr(I, J)上执行lexsort,因此它有助于组织索引,以使(I, J)已经很合理地排序.

SciPy's tocsr does a lexsort on (I, J), so it helps organizing the indices in such a way that (I, J) will come out fairly sorted already.

对于上例中的nx=4ny=2IJ

[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]

首先对cells的每一行进行排序,然后对第一列进行排序,例如

First sorting each row of cells, then the rows by the first column like

cells = numpy.sort(cells, axis=1)
cells = cells[cells[:, 0].argsort()]

产生

[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]

对于原始帖子中的数字,排序将运行时间从大约40秒减少到8秒.

For the number in the original post, sorting cuts down the runtime from about 40 seconds to 8 seconds.

如果首先对节点进行适当的编号,也许可以实现更好的排序.我正在考虑 Cuthill-McKee

Perhaps an even better ordering can be achieved if the nodes are numbered more appropriately in the first place. I'm thinking of Cuthill-McKee and friends.

这篇关于高效地构建FEM/FVM矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆