比较闪电++,armadillo,boost ::多阵列 [英] Compare blitz++, armadillo, boost::MultiArray

查看:320
本文介绍了比较闪电++,armadillo,boost ::多阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在blitz ++,armadillo,boost :: MultiArray和下面的代码之间做了比较(借鉴了一个旧帖

  #include< iostream> 
using namespace std;
#include< windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include& lt; boost / multi_array.hpp>
#include< blitz / array.h>
#include< armadillo>

int main(int argc,char * argv [])
{
const int X_SIZE = 1000;
const int Y_SIZE = 1000;
const int ITERATIONS = 100;
unsigned int startTime = 0;
unsigned int endTime = 0;

//创建boost数组


// ------------------测量boost Loop- -----------------------------------------
{
typedef boost :: multi_array< double,2> ImageArrayType;
ImageArrayType boostMatrix(boost :: extents [X_SIZE] [Y_SIZE]);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y {
boostMatrix [x] [y] = 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([Boost Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}
// ------------------测量闪电循环------------------ -------------------------
{
blitz :: Array< double,2> blitzArray(X_SIZE,Y_SIZE);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y< Y_SIZE; ++ y)
{
blitzArray(x,y)= 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([Blitz Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}

// ------------------测量犰狳环-------------- --------------------------
{
arma :: mat matArray(X_SIZE,Y_SIZE);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y for(int x = 0; x {
matArray(x,y)= 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([arma Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}

// ------------------测量原生循环-------------- --------------------------
//创建本地数组
{
double * nativeMatrix = new double [X_SIZE * Y_SIZE];
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y {
nativeMatrix [y] = 1.0001;
}
}
endTime = :: GetTickCount();
printf([Native Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
delete [] nativeMatrix;
}

// ------------------测量boost计算-------------- ---------------------
{
typedef boost :: multi_array< double,2> ImageArrayType;
ImageArrayType boostMatrix(boost :: extents [X_SIZE] [Y_SIZE]);
for(int x = 0; x {
for(int y = 0; y< Y_SIZE; ++ y)

boostMatrix [x] [y] = 1.0001;
}
}
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y< Y_SIZE; ++ y)
{
boostMatrix [x] [y] + = boostMatrix [x] [y]
}
}
}
endTime = :: GetTickCount();
printf([Boost calculation] Elapsed time:%6.3f seconds\\\
,(endTime - startTime)/ 1000.0);
}

// ------------------测量闪电计算-------------- ---------------------
{
blitz :: Array< double,2> blitzArray(X_SIZE,Y_SIZE);
blitzArray = 1.0001;
startTime = :: GetTickCount();
for(int i = 0; i {
blitzArray + = blitzArray * 0.5;
}
endTime = :: GetTickCount();
printf([Blitz calculation]经过时间:%6.3f秒\\\
,(endTime-startTime)/ 1000.0);
}

// ------------------测量犰狳计算-------------- -----------------
{
arma :: mat matArray(X_SIZE,Y_SIZE);
matArray.fill(1.0001);
startTime = :: GetTickCount();
for(int i = 0; i {
//matArray.fill(1.0001);
matArray + = matArray * 0.5;
}
endTime = :: GetTickCount();
printf([arma calculation] Elapsed time:%6.3f seconds\\\
,(endTime - startTime)/ 1000.0);
}

// ------------------测量本地计算-------------- ----------------------------
//创建本地数组
{
double * nativeMatrix = new double [X_SIZE * Y_SIZE];
for(int y = 0; y< Y_SIZE * X_SIZE; ++ y)
{
nativeMatrix [y] = 1.0001;
}
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y {
nativeMatrix [y] + = nativeMatrix [y] * 0.5;
}
}
endTime = :: GetTickCount();
printf([原生计算]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
delete [] nativeMatrix;
}

return 0;
}

在Windows上,VS2010的结果是

  [Boost Loop]经过时间:1.217秒
[Blitz Loop]经过时间:0.046秒
[arma Loop]经过时间:0.078秒
[原生循环]经过时间:0.172秒
[升压计算]经过时间:2.152秒
[闪击计算]经过时间:0.156秒
[ 0.078秒
[原生计算]经过时间:0.078秒

结果是

  [Boost Loop]经过时间:0.468秒
[Blitz Loop]经过时间:0.125秒
[arma Loop]经过时间:0.046秒
[本地循环]经过时间:0.047秒
[升压计算]经过时间:0.796秒
[闪击计算]经过时间:0.109秒
[arma calculation]经过时间:0.078秒
[原生计算]经过时间:0.062秒

有些奇怪:

 (1)使用VS2010,本机计算(包括循环) b $ b(2)闪电循环在VS2010和intel C ++下表现如此不同。 

要使用intel c ++编译器编译blitz ++,需要一个名为bzconfig.h的文件,夹。但没有。我只是复制在blitz / ms / bzconfig.h中的一个。这可能会给出一个非最佳配置。任何人都可以告诉我如何使用intel c ++编译器编译blitz ++?在手册中,它说运行bzconfig脚本得到正确的bzconfig.h。非常感谢!



添加我的一些结论:

  1。 Boost多阵列是最慢的。 
2.使用intel c ++编译器,本地指针非常快。
3.使用intel c ++编译器,armadillo可以实现本地指针的性能。
4.同样测试本征,在我的简单情况下,它比犰狳慢了x0%。
5.对于正确配置的intel c ++编译器,好奇闪电++的行为。
请看我的问题。


解决方案

据我所知,通过测量将单个矩阵乘以标量的速度来确定每个矩阵库的性能。由于其基于模板的策略,Armadillo将做一个很好的工作,通过分解每个乘法到大多数编译器的并行化代码。



但我建议你需要重新考虑您的测试范围和方法。例如,您已省略了每个 BLAS 实施。您需要的BLAS功能是 dscal



更重要的是,有更多的事情任何合理的向量库需要能够做:矩阵乘法,点积,向量长度,转置等,这些不是您的测试所解决的。你的测试精确地涉及两件事:元素分配,实际上从来不是向量库的瓶颈,以及标量/向量乘法,它是每个CPU制造商提供的BLAS 1级函数。



有关于BLAS级别1与编译器发出的代码的讨论此处



tl:dr; 使用Armadillo与BLAS和LAPACK本机库链接到您的平台


I did a comparison between blitz++, armadillo, boost::MultiArray with the following code (borrowed from an old post)

#include <iostream>
using namespace std;
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>
#include <blitz/array.h>
#include <armadillo>

int main(int argc, char* argv[])
{
    const int X_SIZE = 1000;
    const int Y_SIZE = 1000;
    const int ITERATIONS = 100;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array


    //------------------Measure boost Loop------------------------------------------
    {
        typedef boost::multi_array<double, 2> ImageArrayType;
        ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    boostMatrix[x][y] = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Boost Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }
    //------------------Measure blitz Loop-------------------------------------------
    {
        blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    blitzArray(x,y) = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Blitz Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure armadillo loop----------------------------------------
    {
        arma::mat matArray( X_SIZE, Y_SIZE );
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE; ++y)
            {
                for (int x = 0; x < X_SIZE; ++x)
                {
                    matArray(x,y) = 1.0001;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[arma  Loop]  Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure native loop----------------------------------------
    // Create the native array
    {
        double *nativeMatrix = new double [X_SIZE * Y_SIZE];
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
            {
                nativeMatrix[y] = 1.0001;
            }
        }
        endTime = ::GetTickCount();
        printf("[Native Loop]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
        delete[] nativeMatrix;
    }

    //------------------Measure boost computation-----------------------------------
    {
        typedef boost::multi_array<double, 2> ImageArrayType;
        ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
        for (int x = 0; x < X_SIZE; ++x)
        {
            for (int y = 0; y < Y_SIZE; ++y)
            {
                boostMatrix[x][y] = 1.0001;
            }
        }
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                for (int y = 0; y < Y_SIZE; ++y)
                {
                    boostMatrix[x][y] += boostMatrix[x][y] * 0.5;
                }
            }
        }
        endTime = ::GetTickCount();
        printf("[Boost computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure blitz computation-----------------------------------
    {
        blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
        blitzArray = 1.0001;
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            blitzArray += blitzArray*0.5;
        }
        endTime = ::GetTickCount();
        printf("[Blitz computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure armadillo computation-------------------------------
    {
        arma::mat matArray( X_SIZE, Y_SIZE );
        matArray.fill(1.0001);
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            //matArray.fill(1.0001);
            matArray += matArray*0.5;
        }
        endTime = ::GetTickCount();
        printf("[arma  computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
    }

    //------------------Measure native computation------------------------------------------
    // Create the native array
    {
        double *nativeMatrix = new double [X_SIZE * Y_SIZE];
        for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
        {
            nativeMatrix[y] = 1.0001;
        }
        startTime = ::GetTickCount();
        for (int i = 0; i < ITERATIONS; ++i)
        {
            for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
            {
                nativeMatrix[y] += nativeMatrix[y] * 0.5;
            }
        }
        endTime = ::GetTickCount();
        printf("[Native computation]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
        delete[] nativeMatrix;
    }

    return 0;
}

On windows, VS2010, results are

[Boost Loop] Elapsed time:  1.217 seconds
[Blitz Loop] Elapsed time:  0.046 seconds
[arma  Loop]  Elapsed time:  0.078 seconds
[Native Loop]Elapsed time:  0.172 seconds
[Boost computation] Elapsed time:  2.152 seconds
[Blitz computation] Elapsed time:  0.156 seconds
[arma  computation] Elapsed time:  0.078 seconds
[Native computation]Elapsed time:  0.078 seconds

On windows, intel c++, results are

[Boost Loop] Elapsed time:  0.468 seconds
[Blitz Loop] Elapsed time:  0.125 seconds
[arma  Loop]  Elapsed time:  0.046 seconds
[Native Loop]Elapsed time:  0.047 seconds
[Boost computation] Elapsed time:  0.796 seconds
[Blitz computation] Elapsed time:  0.109 seconds
[arma  computation] Elapsed time:  0.078 seconds
[Native computation]Elapsed time:  0.062 seconds

Something strange:

(1) with VS2010, native computation (including loop) is faster than native loop
(2) blitz loop behave so different under VS2010 and intel C++. 

To compile blitz++ with intel c++ compiler, a file called bzconfig.h is required in blitz/intel/ folder. But there isn't. I just copy the one in blitz/ms/bzconfig.h in. That may give an non-optimal configuration. Anyone can tell me how to compile blitz++ with intel c++ compiler? In the manual, it said run bzconfig script to get the right bzconfig.h. But I don't understand what it means.

Thanks a lot!

Add some of my conclusion:

1. Boost multi array is the slowest.
2. With intel c++ compiler, native pointers are very fast.
3. With intel c++ compiler,  armadillo can achieve the performance of native pointers.
4. Also test eigen, it is x0% slower than armadillo in my simple cases.
5. Curious about blitz++'s behavior in intel c++ compiler with proper configuration.
   Please see my question.

解决方案

As far as I can tell, you are judging the performance of each matrix library by measuring the speed of multiplying a single matrix by a scalar. Due to its template-based policy, Armadillo will do a very good job at this by breaking down each multiply into parallelizable code for most compilers.

But I suggest you need to rethink your test scope and methodology. For example, you've left out every BLAS implementation. The BLAS function you'd need would be dscal. A vendor-provided implementation for your specific CPU would probably do a good job.

More relevantly, there are many more things any reasonable vector library would need to be able to do: matrix multiplies, dot products, vector lengths, transposes, and so forth, which aren't addressed by your test. Your test addresses exactly two things: element assignment, which practically speaking is never a bottleneck for vector libraries, and scalar/vector multiplication, which is a BLAS level 1 function provided by every CPU manufacturer.

There is a discussion of BLAS level 1 vs. compiler-emitted code here.

tl:dr; use Armadillo with BLAS and LAPACK native libraries linked in for your platform.

这篇关于比较闪电++,armadillo,boost ::多阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆