如何使用Eigen避免稀疏表达式中的内存分配 [英] How to avoid memory allocations in sparse expressions with Eigen

查看:58
本文介绍了如何使用Eigen避免稀疏表达式中的内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个稀疏模式恒定的应用程序。可以说我的计算形式为

I have an application where the sparsity pattern is constant. Lets say that my computations are in the form

sm3 = sm1 + sm2

但是,即使我将所有这些操作数的稀疏模式都设置为相同,我的探查器显示大部分时间都花在了分配和取消分配上结果矩阵。

However, even I have set the sparsity pattern to be the same in all of those operands, my profiler shows that most of the time is spent in allocating and deallocating the result matrix.

这是我的MWE:

#include <eigen3/Eigen/Sparse>
#include <iostream>

int main(int argc, char *argv[])
{
  using namespace Eigen;

  SparseMatrix<double> sm1(2, 2), sm2(2, 2), sm3(2, 2);

  // Populate sm1 and sm2
  sm1.insert(0,0) = 2.0;

  sm2.insert(1,1) = 3.0;

  // Compute the result pattern
  sm3 = sm1 + sm2;

  // Copy the augmented pattern into the operands
  sm1 = sm2 = sm3;

  // This loop triggers a lot of new[] and delete[] calls
  for(int i = 0; i < 1000; i++)
    sm3 = sm2 + sm1;
}

可以避免那些分配操作吗?

Can those allocating operations be avoided?

推荐答案

当前这是不可能的,因为默认情况下假定稀疏矩阵为别名。例如,如果您这样做:

This is currently not possible because sparse matrices are assumed to alias by default. For instance, if you do:

m3 = m3 + m1;

不完全包含 m1 的模式在 m3 之一中,则无法直接在 m3 中评估表达式。在未来中,我们可以使用以下语法强制重新使用目标内存:

with the pattern of m1 not entirely included in the one of m3, then evaluating the expression directly within m3 would not be possible. In the future, we could enforce re-use of the destination memory with a syntax like:

m3.noalias() = m1 + m2;

同时,由于矩阵很小,您可以通过强制执行以下方法来解决甚至获得更高的性能 m1 m2 的模式与 m3 通过添加一些显式零。然后,使用Eigen 3.3,可以将稀疏加法转换为密集向量的加法运算:

In the meantime, since your matrices are small, you can workaround and even get higher performance by enforcing that the pattern of m1 and m2 are the same than the pattern of m3 by adding some explicit zeros. Then, with Eigen 3.3, you can cast the sparse addition into the addition of dense vectors:

m3.coeffs() = m1.coeffs() + m2.coeffs();

即使 m1 m2 很小,因为您摆脱了内存间接访问,并且受益于向量化(不要忘记启用AVX),您将获得非常高的加速比(可能是一个数量级)与,例如 -mavx )。

Even if the intersection between m1 and m2 is small, you will get very high speedups (probably one order of magnitude) because you get rid of the memory indirections, and benefit from vectorization (don't forget to enable AVX with, e.g., -mavx).

这篇关于如何使用Eigen避免稀疏表达式中的内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆