本征密矩阵*密矢量乘法应该比GSL慢7倍吗? [英] Should Eigen dense matrix * dense vector multiplication be 7 times slower than GSL?

查看:99
本文介绍了本征密矩阵*密矢量乘法应该比GSL慢7倍吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题的答案我的让我期望(对于具有1/4不消失项的矩阵)在Eigen中,乘积
密集矩阵*密集向量
的性能应优于
稀疏矩阵*密集向量。

The answer to this question of mine made me expect that (for matrices with 1/4 of non-vanishing entries) in Eigen the product Dense Matrix * Dense Vector should outperform Sparse Matrix*Dense Vector.

我不仅看到了相反的结果,而且两者的表现也分别比GSL高7倍和4倍。

Not only do I see the opposite, but also both are outperformed by GSL by a factor of 7 and 4 respectively.

我使用Eigen错误吗?我是否不小心计时?我非常惊讶。

Am I using Eigen incorrectly? Am I timing carelessly? I am very startled.

我的编译选项为:


ludi @ ludi -M17xR4:〜/ Desktop / tests $ g ++ -o eigenfill.x eigenfill.cc -L / usr / local / lib -lgsl -lgslcblas&& ./eigenfill.x

ludi@ludi-M17xR4:~/Desktop/tests$ g++ -o eigenfill.x eigenfill.cc -L/usr/local/lib -lgsl -lgslcblas && ./eigenfill.x

我的代码为:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>     
#include <Eigen/Sparse>
#include <Eigen/Dense>
#include <gsl/gsl_matrix.h>
#include <sys/time.h>
#include <gsl/gsl_blas.h>
#define helix 100
#define rows helix*helix
#define cols rows
#define filling rows/4
#define REPS 10


using namespace Eigen;

/*-- DECLARATIONES --*/
int FillSparseMatrix(SparseMatrix<double> & mat);
int FillDenseMatrices(MatrixXd & Mat, gsl_matrix *testmat);
double vee(int i, int j);
int set_vectors_randomly(gsl_vector * v2, VectorXd v1);

int main()
{
int rep;
    struct timeval tval_before, tval_after, tval_result;

gsl_matrix *testmat     = gsl_matrix_calloc(rows, cols);
gsl_vector *v2      =gsl_vector_calloc(cols);
gsl_vector *prod    =gsl_vector_calloc(cols);

SparseMatrix<double> mat(rows,cols);         // default is column major
MatrixXd Mat(rows,cols);         // default is column major
VectorXd v1(cols), vv1(cols);

FillSparseMatrix(mat);
FillDenseMatrices(Mat, testmat);
    printf("\n/*--- --- --- ---*/\n");
for(rep=0;rep<REPS;rep++)
{
set_vectors_randomly(v2, v1);

    gettimeofday(&tval_before, NULL);       
vv1 = mat*v1;
    gettimeofday(&tval_after, NULL);
    timersub(&tval_after, &tval_before, &tval_result);

    printf("Time for one product, SPARSE EIGEN / secs: %ld.%06ld\n", (long int)tval_result.tv_sec, (long int)tval_result.tv_usec);
    gettimeofday(&tval_before, NULL);       
gsl_blas_dgemv( CblasNoTrans,1.0, testmat, v2, 0.0, prod);
    gettimeofday(&tval_after, NULL);
    timersub(&tval_after, &tval_before, &tval_result);
    printf("Time for one product, GSL / secs: %ld.%06ld\n", (long int)tval_result.tv_sec, (long int)tval_result.tv_usec);

    gettimeofday(&tval_before, NULL);       
vv1 = Mat*v1;
    gettimeofday(&tval_after, NULL);
    timersub(&tval_after, &tval_before, &tval_result);
    printf("Time for one product, DENSE EIGEN / secs: %ld.%06ld\n", (long int)tval_result.tv_sec, (long int)tval_result.tv_usec);
    printf("/*--- --- --- ---*/\n\n");


  //std::cout << mat << std::endl;
}
gsl_matrix_free(testmat);   
printf("--- --- --->DONE\n");
return(0);
}

/*-- --*/
int FillSparseMatrix(SparseMatrix<double> &mat)
{
int i, j;
Eigen::VectorXd Vres;
mat.reserve(Eigen::VectorXi::Constant(cols,filling));

printf("Filling Sparse Matrix ...");
    for(i=0;i<rows;i++)
    {
        if(i%2500==0){printf("i= %i\n", i);}
    for(j=0;j<cols;j++)
        {
        if (vee(i,j) != 0){mat.insert(i,j) = vee(i,j);    /*alternative: mat.coeffRef(i,j) += v_ij;*/ }
        }

    }

return(0);
}
/*-- --*/

/*-- --*/
int FillDenseMatrices(MatrixXd &Mat, gsl_matrix * testmat)
{
int i, j;
Eigen::VectorXd Vres;
double aux;
printf("Filling Dense Matrix ...");
    for(i=0;i<rows;i++)
    {
        if(i%2500==0){printf("i= %i\n", i);}
    for(j=0;j<cols;j++)
        {
        aux = vee(i,j);
        if (aux != 0)
        {
        Mat(i,j) = aux;    
        gsl_matrix_set(testmat, i, j, aux);
        }
        }

    }
return(0);
}
/*-- --*/

double vee(int i, int j)
{
    double result = 0.0;

    if(i%4 == 0){result =1.0;}

    return result;
}
/*-- --*/
int set_vectors_randomly(gsl_vector * v2, VectorXd v1){
printf("Setting vectors rendomly anew ...\n");
for (int j=0;j<cols;j++) 
{
double r=drand48();
v1(j) =r;
gsl_vector_set(v2, j, r);

}
return(0);
}
/*-- --*/


推荐答案

在使用Eigen进行编译时,如果不进行编译器优化,则性能将非常糟糕。有几种方法可以显着提高性能:

With Eigen, performance is abysmal when compiling without compiler optimizations. There are several ways to increase performance dramatically:


  • 启用优化功能(在g ++中为-O2或-O3),性能可以达到两三

  • 通过在包含Eigen库之前定义 NDEBUG ,可以实现额外的(较小的)加速。这会禁用边界检查,因此请确保在启用此功能之前没有问题。

  • 本机还可以利用矢量化(SSE从3.2.6开始,AVX从3.3开始,PowerPC和ARM也是如此)进一步提高了速度。只需在编译器中启用相关标志即可。

  • 启用OMP 也会导致加速。再次,在编译器中启用相关标志,其余部分由Eigen完成。

  • With optimizations turned on (-O2 or -O3 in g++) performance can be two-three orders of magnitude higher.
  • An additional (smaller) speedup can be attained by defining NDEBUG before including the Eigen library. This disables bounds checking, so make sure there are no issues before enabling this feature.
  • Eigen can also take advantage of vectorization (SSE as of 3.2.6 and AVX as of 3.3, PowerPC and ARM as well) leading to further speedups. Just enable the relevant flags in your compiler.
  • Enabling OMP can lead to speedups as well. Again, enable the relevant flags in your compiler and Eigen will do the rest.

这篇关于本征密矩阵*密矢量乘法应该比GSL慢7倍吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆