比较闪电++，armadillo，boost ::多阵列 [英] Compare blitz++, armadillo, boost::MultiArray

查看：320 发布时间：2016/10/14 23:36:17 c++ multidimensional-array armadillo boost-multi-array blitz++

本文介绍了比较闪电++，armadillo，boost ::多阵列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在blitz ++，armadillo，boost :: MultiArray和下面的代码之间做了比较（借鉴了一个旧帖）

  #include< iostream> 
 using namespace std; 
 #include< windows.h> 
 #define _SCL_SECURE_NO_WARNINGS 
 #define BOOST_DISABLE_ASSERTS 
 #include& lt; boost / multi_array.hpp> 
 #include< blitz / array.h> 
 #include< armadillo> 
 
 int main（int argc，char * argv []）
 {
 const int X_SIZE = 1000; 
 const int Y_SIZE = 1000; 
 const int ITERATIONS = 100; 
 unsigned int startTime = 0; 
 unsigned int endTime = 0; 
 
 //创建boost数组
 
 
 // ------------------测量boost Loop- ----------------------------------------- 
 {
 typedef boost :: multi_array< double，2> ImageArrayType; 
 ImageArrayType boostMatrix（boost :: extents [X_SIZE] [Y_SIZE]）; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int x = 0; x  for（int y = 0; y  {
 boostMatrix [x] [y] = 1.0001; 
} 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[Boost Loop]经过时间：％6.3f秒\\\
，（结束时间 - 开始时间）/ 1000.0）; 
} 
 // ------------------测量闪电循环------------------ ------------------------- 
 {
 blitz :: Array< double，2> blitzArray（X_SIZE，Y_SIZE）; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int x = 0; x  for（int y = 0; y< Y_SIZE; ++ y）
 {
 blitzArray（x，y）= 1.0001; 
} 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[Blitz Loop]经过时间：％6.3f秒\\\
，（结束时间 - 开始时间）/ 1000.0）; 
} 
 
 // ------------------测量犰狳环-------------- -------------------------- 
 {
 arma :: mat matArray（X_SIZE，Y_SIZE）; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int y = 0; y  for（int x = 0; x  {
 matArray（x，y）= 1.0001; 
} 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[arma Loop]经过时间：％6.3f秒\\\
，（结束时间 - 开始时间）/ 1000.0）; 
} 
 
 // ------------------测量原生循环-------------- -------------------------- 
 //创建本地数组
 {
 double * nativeMatrix = new double [X_SIZE * Y_SIZE]; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int y = 0; y  {
 nativeMatrix [y] = 1.0001; 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[Native Loop]经过时间：％6.3f秒\\\
，（结束时间 - 开始时间）/ 1000.0）; 
 delete [] nativeMatrix; 
} 
 
 // ------------------测量boost计算-------------- --------------------- 
 {
 typedef boost :: multi_array< double，2> ImageArrayType; 
 ImageArrayType boostMatrix（boost :: extents [X_SIZE] [Y_SIZE]）; 
 for（int x = 0; x  {
 for（int y = 0; y< Y_SIZE; ++ y）
 
 boostMatrix [x] [y] = 1.0001; 
} 
} 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int x = 0; x  for（int y = 0; y< Y_SIZE; ++ y）
 {
 boostMatrix [x] [y] + = boostMatrix [x] [y] 
} 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[Boost calculation] Elapsed time：％6.3f seconds\\\
，（endTime  -  startTime）/ 1000.0）; 
} 
 
 // ------------------测量闪电计算-------------- --------------------- 
 {
 blitz :: Array< double，2> blitzArray（X_SIZE，Y_SIZE）; 
 blitzArray = 1.0001; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 blitzArray + = blitzArray * 0.5; 
} 
 endTime = :: GetTickCount（）; 
 printf（[Blitz calculation]经过时间：％6.3f秒\\\
，（endTime-startTime）/ 1000.0）; 
} 
 
 // ------------------测量犰狳计算-------------- ----------------- 
 {
 arma :: mat matArray（X_SIZE，Y_SIZE）; 
 matArray.fill（1.0001）; 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 //matArray.fill(1.0001）; 
 matArray + = matArray * 0.5; 
} 
 endTime = :: GetTickCount（）; 
 printf（[arma calculation] Elapsed time：％6.3f seconds\\\
，（endTime  -  startTime）/ 1000.0）; 
} 
 
 // ------------------测量本地计算-------------- ---------------------------- 
 //创建本地数组
 {
 double * nativeMatrix = new double [X_SIZE * Y_SIZE]; 
 for（int y = 0; y< Y_SIZE * X_SIZE; ++ y）
 {
 nativeMatrix [y] = 1.0001; 
} 
 startTime = :: GetTickCount（）; 
 for（int i = 0; i  {
 for（int y = 0; y  {
 nativeMatrix [y] + = nativeMatrix [y] * 0.5; 
} 
} 
 endTime = :: GetTickCount（）; 
 printf（[原生计算]经过时间：％6.3f秒\\\
，（结束时间 - 开始时间）/ 1000.0）; 
 delete [] nativeMatrix; 
} 
 
 return 0; 
}

在Windows上，VS2010的结果是

  [Boost Loop]经过时间：1.217秒
 [Blitz Loop]经过时间：0.046秒
 [arma Loop]经过时间：0.078秒
 [原生循环]经过时间：0.172秒
 [升压计算]经过时间：2.152秒
 [闪击计算]经过时间：0.156秒
 [ 0.078秒
 [原生计算]经过时间：0.078秒

结果是

  [Boost Loop]经过时间：0.468秒
 [Blitz Loop]经过时间：0.125秒
 [arma Loop]经过时间：0.046秒
 [本地循环]经过时间：0.047秒
 [升压计算]经过时间：0.796秒
 [闪击计算]经过时间：0.109秒
 [arma calculation]经过时间：0.078秒
 [原生计算]经过时间：0.062秒

有些奇怪：

 （1）使用VS2010，本机计算（包括循环） b $ b（2）闪电循环在VS2010和intel C ++下表现如此不同。

要使用intel c ++编译器编译blitz ++，需要一个名为bzconfig.h的文件，夹。但没有。我只是复制在blitz / ms / bzconfig.h中的一个。这可能会给出一个非最佳配置。任何人都可以告诉我如何使用intel c ++编译器编译blitz ++？在手册中，它说运行bzconfig脚本得到正确的bzconfig.h。非常感谢！

 
 
 添加我的一些结论： 
  1。 Boost多阵列是最慢的。 
 2.使用intel c ++编译器，本地指针非常快。 
 3.使用intel c ++编译器，armadillo可以实现本地指针的性能。 
 4.同样测试本征，在我的简单情况下，它比犰狳慢了x0％。 
 5.对于正确配置的intel c ++编译器，好奇闪电++的行为。 
请看我的问题。 
  
 
 
解决方案
据我所知，通过测量将单个矩阵乘以标量的速度来确定每个矩阵库的性能。由于其基于模板的策略，Armadillo将做一个很好的工作，通过分解每个乘法到大多数编译器的并行化代码。
 
 
 但我建议你需要重新考虑您的测试范围和方法。例如，您已省略了每个 BLAS 实施。您需要的BLAS功能是 dscal 。 
 
 
 更重要的是，有更多的事情任何合理的向量库需要能够做：矩阵乘法，点积，向量长度，转置等，这些不是您的测试所解决的。你的测试精确地涉及两件事：元素分配，实际上从来不是向量库的瓶颈，以及标量/向量乘法，它是每个CPU制造商提供的BLAS 1级函数。
 
 
 有关于BLAS级别1与编译器发出的代码的讨论此处。
 
 
  tl：dr; 使用Armadillo与BLAS和LAPACK本机库链接到您的平台。
 
I did a comparison between blitz++, armadillo, boost::MultiArray with the following code (borrowed from an old post)
#include <iostream>
using namespace std;
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>
#include <blitz/array.h>
#include <armadillo>

int main(int argc, char* argv[])
{
    const int X_SIZE = 1000;
    const int Y_SIZE = 1000;
    const int ITERATIONS = 100;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array


    //------------------Measure boost Loop------------------------------------------
    {
        typedef boost::multi_array<double, 2> ImageArrayType;
        ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    boostMatrix[x][y] = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Boost Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }
    //------------------Measure blitz Loop-------------------------------------------
    {
        blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    blitzArray(x,y) = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Blitz Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure armadillo loop----------------------------------------
    {
        arma::mat matArray( X_SIZE, Y_SIZE );
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE; ++y)
            {
                for (int x = 0; x < X_SIZE; ++x)
                {
                    matArray(x,y) = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[arma  Loop]  Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure native loop----------------------------------------
    // Create the native array
    {
        double *nativeMatrix = new double [X_SIZE * Y_SIZE];
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
            {
                nativeMatrix[y] = 1.0001;
            }
        }
        endTime = ::GetTickCount();
        printf("[Native Loop]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
        delete[] nativeMatrix;
    }

    //------------------Measure boost computation-----------------------------------
    {
        typedef boost::multi_array<double, 2> ImageArrayType;
        ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
        for (int x = 0; x < X_SIZE; ++x)
        {
            for (int y = 0; y < Y_SIZE; ++y)
            {
                boostMatrix[x][y] = 1.0001;
            }
        }
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    boostMatrix[x][y] += boostMatrix[x][y] * 0.5;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Boost computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure blitz computation-----------------------------------
    {
        blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
        blitzArray = 1.0001;
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            blitzArray += blitzArray*0.5;
        }
        endTime = ::GetTickCount();
        printf("[Blitz computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure armadillo computation-------------------------------
    {
        arma::mat matArray( X_SIZE, Y_SIZE );
        matArray.fill(1.0001);
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            //matArray.fill(1.0001);
            matArray += matArray*0.5;
        }
        endTime = ::GetTickCount();
        printf("[arma  computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure native computation------------------------------------------
    // Create the native array
    {
        double *nativeMatrix = new double [X_SIZE * Y_SIZE];
        for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
        {
            nativeMatrix[y] = 1.0001;
        }
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
            {
                nativeMatrix[y] += nativeMatrix[y] * 0.5;
            }
        }
        endTime = ::GetTickCount();
        printf("[Native computation]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
        delete[] nativeMatrix;
    }

    return 0;
}
On windows, VS2010, results are
[Boost Loop] Elapsed time:  1.217 seconds
[Blitz Loop] Elapsed time:  0.046 seconds
[arma  Loop]  Elapsed time:  0.078 seconds
[Native Loop]Elapsed time:  0.172 seconds
[Boost computation] Elapsed time:  2.152 seconds
[Blitz computation] Elapsed time:  0.156 seconds
[arma  computation] Elapsed time:  0.078 seconds
[Native computation]Elapsed time:  0.078 seconds
On windows, intel c++, results are
[Boost Loop] Elapsed time:  0.468 seconds
[Blitz Loop] Elapsed time:  0.125 seconds
[arma  Loop]  Elapsed time:  0.046 seconds
[Native Loop]Elapsed time:  0.047 seconds
[Boost computation] Elapsed time:  0.796 seconds
[Blitz computation] Elapsed time:  0.109 seconds
[arma  computation] Elapsed time:  0.078 seconds
[Native computation]Elapsed time:  0.062 seconds
Something strange: 
(1) with VS2010, native computation (including loop) is faster than native loop
(2) blitz loop behave so different under VS2010 and intel C++. 
To compile blitz++ with intel c++ compiler, a file called bzconfig.h is required in blitz/intel/ folder. But there isn't. I just copy the one in blitz/ms/bzconfig.h in. That may give an non-optimal configuration. Anyone can tell me how to compile blitz++ with intel c++ compiler? In the manual, it said run bzconfig script to get the right bzconfig.h. But I don't understand what it means.

Thanks a lot!

Add some of my conclusion:
1. Boost multi array is the slowest.
2. With intel c++ compiler, native pointers are very fast.
3. With intel c++ compiler,  armadillo can achieve the performance of native pointers.
4. Also test eigen, it is x0% slower than armadillo in my simple cases.
5. Curious about blitz++'s behavior in intel c++ compiler with proper configuration.
   Please see my question.

 解决方案 
As far as I can tell, you are judging the performance of each matrix library by measuring the speed of multiplying a single matrix by a scalar.  Due to its template-based policy, Armadillo will do a very good job at this by breaking down each multiply into parallelizable code for most compilers.

But I suggest you need to rethink your test scope and methodology.  For example, you've left out every BLAS implementation.  The BLAS function you'd need would be dscal.  A vendor-provided implementation for your specific CPU would probably do a good job.

More relevantly, there are many more things any reasonable vector library would need to be able to do: matrix multiplies, dot products, vector lengths, transposes, and so forth, which aren't addressed by your test.  Your test addresses exactly two things: element assignment, which practically speaking is never a bottleneck for vector libraries, and scalar/vector multiplication, which is a BLAS level 1 function provided by every CPU manufacturer.

There is a discussion of BLAS level 1 vs. compiler-emitted code here.

tl:dr; use Armadillo with BLAS and LAPACK native libraries linked in for your platform.

                        这篇关于比较闪电++，armadillo，boost ::多阵列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

比较闪电++，armadillo，boost ::多阵列 [英] Compare blitz++, armadillo, boost::MultiArray

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

比较闪电++，armadillo，boost ::多阵列 [英] Compare blitz++, armadillo, boost::MultiArray

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭