比较闪电++,armadillo,boost ::多阵列 [英] Compare blitz++, armadillo, boost::MultiArray
问题描述
我在blitz ++,armadillo,boost :: MultiArray和下面的代码之间做了比较(借鉴了一个旧帖)
#include< iostream>
using namespace std;
#include< windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include& lt; boost / multi_array.hpp>
#include< blitz / array.h>
#include< armadillo>
int main(int argc,char * argv [])
{
const int X_SIZE = 1000;
const int Y_SIZE = 1000;
const int ITERATIONS = 100;
unsigned int startTime = 0;
unsigned int endTime = 0;
//创建boost数组
// ------------------测量boost Loop- -----------------------------------------
{
typedef boost :: multi_array< double,2> ImageArrayType;
ImageArrayType boostMatrix(boost :: extents [X_SIZE] [Y_SIZE]);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y {
boostMatrix [x] [y] = 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([Boost Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}
// ------------------测量闪电循环------------------ -------------------------
{
blitz :: Array< double,2> blitzArray(X_SIZE,Y_SIZE);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y< Y_SIZE; ++ y)
{
blitzArray(x,y)= 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([Blitz Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}
// ------------------测量犰狳环-------------- --------------------------
{
arma :: mat matArray(X_SIZE,Y_SIZE);
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y for(int x = 0; x {
matArray(x,y)= 1.0001;
}
}
}
endTime = :: GetTickCount();
printf([arma Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
}
// ------------------测量原生循环-------------- --------------------------
//创建本地数组
{
double * nativeMatrix = new double [X_SIZE * Y_SIZE];
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y {
nativeMatrix [y] = 1.0001;
}
}
endTime = :: GetTickCount();
printf([Native Loop]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
delete [] nativeMatrix;
}
// ------------------测量boost计算-------------- ---------------------
{
typedef boost :: multi_array< double,2> ImageArrayType;
ImageArrayType boostMatrix(boost :: extents [X_SIZE] [Y_SIZE]);
for(int x = 0; x {
for(int y = 0; y< Y_SIZE; ++ y)
boostMatrix [x] [y] = 1.0001;
}
}
startTime = :: GetTickCount();
for(int i = 0; i {
for(int x = 0; x for(int y = 0; y< Y_SIZE; ++ y)
{
boostMatrix [x] [y] + = boostMatrix [x] [y]
}
}
}
endTime = :: GetTickCount();
printf([Boost calculation] Elapsed time:%6.3f seconds\\\
,(endTime - startTime)/ 1000.0);
}
// ------------------测量闪电计算-------------- ---------------------
{
blitz :: Array< double,2> blitzArray(X_SIZE,Y_SIZE);
blitzArray = 1.0001;
startTime = :: GetTickCount();
for(int i = 0; i {
blitzArray + = blitzArray * 0.5;
}
endTime = :: GetTickCount();
printf([Blitz calculation]经过时间:%6.3f秒\\\
,(endTime-startTime)/ 1000.0);
}
// ------------------测量犰狳计算-------------- -----------------
{
arma :: mat matArray(X_SIZE,Y_SIZE);
matArray.fill(1.0001);
startTime = :: GetTickCount();
for(int i = 0; i {
//matArray.fill(1.0001);
matArray + = matArray * 0.5;
}
endTime = :: GetTickCount();
printf([arma calculation] Elapsed time:%6.3f seconds\\\
,(endTime - startTime)/ 1000.0);
}
// ------------------测量本地计算-------------- ----------------------------
//创建本地数组
{
double * nativeMatrix = new double [X_SIZE * Y_SIZE];
for(int y = 0; y< Y_SIZE * X_SIZE; ++ y)
{
nativeMatrix [y] = 1.0001;
}
startTime = :: GetTickCount();
for(int i = 0; i {
for(int y = 0; y {
nativeMatrix [y] + = nativeMatrix [y] * 0.5;
}
}
endTime = :: GetTickCount();
printf([原生计算]经过时间:%6.3f秒\\\
,(结束时间 - 开始时间)/ 1000.0);
delete [] nativeMatrix;
}
return 0;
}
在Windows上,VS2010的结果是
[Boost Loop]经过时间:1.217秒
[Blitz Loop]经过时间:0.046秒
[arma Loop]经过时间:0.078秒
[原生循环]经过时间:0.172秒
[升压计算]经过时间:2.152秒
[闪击计算]经过时间:0.156秒
[ 0.078秒
[原生计算]经过时间:0.078秒
结果是
[Boost Loop]经过时间:0.468秒
[Blitz Loop]经过时间:0.125秒
[arma Loop]经过时间:0.046秒
[本地循环]经过时间:0.047秒
[升压计算]经过时间:0.796秒
[闪击计算]经过时间:0.109秒
[arma calculation]经过时间:0.078秒
[原生计算]经过时间:0.062秒
有些奇怪:
(1)使用VS2010,本机计算(包括循环) b $ b(2)闪电循环在VS2010和intel C ++下表现如此不同。
要使用intel c ++编译器编译blitz ++,需要一个名为bzconfig.h的文件,夹。但没有。我只是复制在blitz / ms / bzconfig.h中的一个。这可能会给出一个非最佳配置。任何人都可以告诉我如何使用intel c ++编译器编译blitz ++?在手册中,它说运行bzconfig脚本得到正确的bzconfig.h。非常感谢!
添加我的一些结论:
1。 Boost多阵列是最慢的。
2.使用intel c ++编译器,本地指针非常快。
3.使用intel c ++编译器,armadillo可以实现本地指针的性能。
4.同样测试本征,在我的简单情况下,它比犰狳慢了x0%。
5.对于正确配置的intel c ++编译器,好奇闪电++的行为。
请看我的问题。
据我所知,通过测量将单个矩阵乘以标量的速度来确定每个矩阵库的性能。由于其基于模板的策略,Armadillo将做一个很好的工作,通过分解每个乘法到大多数编译器的并行化代码。
但我建议你需要重新考虑您的测试范围和方法。例如,您已省略了每个 BLAS 实施。您需要的BLAS功能是 dscal 。
更重要的是,有更多的事情任何合理的向量库需要能够做:矩阵乘法,点积,向量长度,转置等,这些不是您的测试所解决的。你的测试精确地涉及两件事:元素分配,实际上从来不是向量库的瓶颈,以及标量/向量乘法,它是每个CPU制造商提供的BLAS 1级函数。
有关于BLAS级别1与编译器发出的代码的讨论此处。
tl:dr; 使用Armadillo与BLAS和LAPACK本机库链接到您的平台。
I did a comparison between blitz++, armadillo, boost::MultiArray with the following code (borrowed from an old post)
#include <iostream>
using namespace std;
#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <blitz/array.h>
#include <armadillo>
int main(int argc, char* argv[])
{
const int X_SIZE = 1000;
const int Y_SIZE = 1000;
const int ITERATIONS = 100;
unsigned int startTime = 0;
unsigned int endTime = 0;
// Create the boost array
//------------------Measure boost Loop------------------------------------------
{
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[Boost Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure blitz Loop-------------------------------------------
{
blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
blitzArray(x,y) = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[Blitz Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure armadillo loop----------------------------------------
{
arma::mat matArray( X_SIZE, Y_SIZE );
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE; ++y)
{
for (int x = 0; x < X_SIZE; ++x)
{
matArray(x,y) = 1.0001;
}
}
}
endTime = ::GetTickCount();
printf("[arma Loop] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure native loop----------------------------------------
// Create the native array
{
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] = 1.0001;
}
}
endTime = ::GetTickCount();
printf("[Native Loop]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
delete[] nativeMatrix;
}
//------------------Measure boost computation-----------------------------------
{
typedef boost::multi_array<double, 2> ImageArrayType;
ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] = 1.0001;
}
}
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
boostMatrix[x][y] += boostMatrix[x][y] * 0.5;
}
}
}
endTime = ::GetTickCount();
printf("[Boost computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure blitz computation-----------------------------------
{
blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
blitzArray = 1.0001;
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
blitzArray += blitzArray*0.5;
}
endTime = ::GetTickCount();
printf("[Blitz computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure armadillo computation-------------------------------
{
arma::mat matArray( X_SIZE, Y_SIZE );
matArray.fill(1.0001);
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
//matArray.fill(1.0001);
matArray += matArray*0.5;
}
endTime = ::GetTickCount();
printf("[arma computation] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
}
//------------------Measure native computation------------------------------------------
// Create the native array
{
double *nativeMatrix = new double [X_SIZE * Y_SIZE];
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] = 1.0001;
}
startTime = ::GetTickCount();
for (int i = 0; i < ITERATIONS; ++i)
{
for (int y = 0; y < Y_SIZE*X_SIZE; ++y)
{
nativeMatrix[y] += nativeMatrix[y] * 0.5;
}
}
endTime = ::GetTickCount();
printf("[Native computation]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);
delete[] nativeMatrix;
}
return 0;
}
On windows, VS2010, results are
[Boost Loop] Elapsed time: 1.217 seconds
[Blitz Loop] Elapsed time: 0.046 seconds
[arma Loop] Elapsed time: 0.078 seconds
[Native Loop]Elapsed time: 0.172 seconds
[Boost computation] Elapsed time: 2.152 seconds
[Blitz computation] Elapsed time: 0.156 seconds
[arma computation] Elapsed time: 0.078 seconds
[Native computation]Elapsed time: 0.078 seconds
On windows, intel c++, results are
[Boost Loop] Elapsed time: 0.468 seconds
[Blitz Loop] Elapsed time: 0.125 seconds
[arma Loop] Elapsed time: 0.046 seconds
[Native Loop]Elapsed time: 0.047 seconds
[Boost computation] Elapsed time: 0.796 seconds
[Blitz computation] Elapsed time: 0.109 seconds
[arma computation] Elapsed time: 0.078 seconds
[Native computation]Elapsed time: 0.062 seconds
Something strange:
(1) with VS2010, native computation (including loop) is faster than native loop
(2) blitz loop behave so different under VS2010 and intel C++.
To compile blitz++ with intel c++ compiler, a file called bzconfig.h is required in blitz/intel/ folder. But there isn't. I just copy the one in blitz/ms/bzconfig.h in. That may give an non-optimal configuration. Anyone can tell me how to compile blitz++ with intel c++ compiler? In the manual, it said run bzconfig script to get the right bzconfig.h. But I don't understand what it means.
Thanks a lot!
Add some of my conclusion:
1. Boost multi array is the slowest.
2. With intel c++ compiler, native pointers are very fast.
3. With intel c++ compiler, armadillo can achieve the performance of native pointers.
4. Also test eigen, it is x0% slower than armadillo in my simple cases.
5. Curious about blitz++'s behavior in intel c++ compiler with proper configuration.
Please see my question.
As far as I can tell, you are judging the performance of each matrix library by measuring the speed of multiplying a single matrix by a scalar. Due to its template-based policy, Armadillo will do a very good job at this by breaking down each multiply into parallelizable code for most compilers.
But I suggest you need to rethink your test scope and methodology. For example, you've left out every BLAS implementation. The BLAS function you'd need would be dscal. A vendor-provided implementation for your specific CPU would probably do a good job.
More relevantly, there are many more things any reasonable vector library would need to be able to do: matrix multiplies, dot products, vector lengths, transposes, and so forth, which aren't addressed by your test. Your test addresses exactly two things: element assignment, which practically speaking is never a bottleneck for vector libraries, and scalar/vector multiplication, which is a BLAS level 1 function provided by every CPU manufacturer.
There is a discussion of BLAS level 1 vs. compiler-emitted code here.
tl:dr; use Armadillo with BLAS and LAPACK native libraries linked in for your platform.
这篇关于比较闪电++,armadillo,boost ::多阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!