基准测试(使用BLAS的python vs. c ++)和(numpy) [英] Benchmarking (python vs. c++ using BLAS) and (numpy)

查看:213
本文介绍了基准测试(使用BLAS的python vs. c ++)和(numpy)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个广泛使用BLAS和LAPACK线性代数功能的程序。因为表演是一个问题,我做了一些基准测试,想知道,如果我采取的方法是合法的。

I would like to write a program that makes extensive use of BLAS and LAPACK linear algebra functionalities. Since performance is an issue I did some benchmarking and would like know, if the approach I took is legitimate.

我可以说,三个参赛者,它们的性能具有简单的矩阵矩阵乘法。参赛者是:

I have, so to speak, three contestants and want to test their performance with a simple matrix-matrix multiplication. The contestants are:


  1. Numpy,仅使用 dot 的功能。

  2. Python通过共享对象调用BLAS功能。

  3. C ++,通过共享对象调用BLAS功能。

  1. Numpy, making use only of the functionality of dot.
  2. Python, calling the BLAS functionalities through a shared object.
  3. C++, calling the BLAS functionalities through a shared object.



场景



我为不同尺寸实现了矩阵矩阵乘法 i i 运行从5到500,增量为5,基数 m1 m2 设置如下:

Scenario

I implemented a matrix-matrix multiplication for different dimensions i. i runs from 5 to 500 with an increment of 5 and the matricies m1 and m2 are set up like this:

m1 = numpy.random.rand(i,i).astype(numpy.float32)
m2 = numpy.random.rand(i,i).astype(numpy.float32)



1。 Numpy



使用的代码如下:

1. Numpy

The code used looks like this:

tNumpy = timeit.Timer("numpy.dot(m1, m2)", "import numpy; from __main__ import m1, m2")
rNumpy.append((i, tNumpy.repeat(20, 1)))



2。 Python通过共享对象调用BLAS



使用函数

2. Python, calling BLAS through a shared object

With the function

_blaslib = ctypes.cdll.LoadLibrary("libblas.so")
def Mul(m1, m2, i, r):

    no_trans = c_char("n")
    n = c_int(i)
    one = c_float(1.0)
    zero = c_float(0.0)

    _blaslib.sgemm_(byref(no_trans), byref(no_trans), byref(n), byref(n), byref(n), 
            byref(one), m1.ctypes.data_as(ctypes.c_void_p), byref(n), 
            m2.ctypes.data_as(ctypes.c_void_p), byref(n), byref(zero), 
            r.ctypes.data_as(ctypes.c_void_p), byref(n))

测试代码如下所示:

r = numpy.zeros((i,i), numpy.float32)
tBlas = timeit.Timer("Mul(m1, m2, i, r)", "import numpy; from __main__ import i, m1, m2, r, Mul")
rBlas.append((i, tBlas.repeat(20, 1)))



3 。 c ++,通过共享对象调用BLAS



现在,c ++代码自然就有点长了,所以我将信息减少到最小。

I load该函数带有

3. c++, calling BLAS through a shared object

Now the c++ code naturally is a little longer so I reduce the information to a minimum.
I load the function with

void* handle = dlopen("libblas.so", RTLD_LAZY);
void* Func = dlsym(handle, "sgemm_");

我用 gettimeofday 测量时间:

gettimeofday(&start, NULL);
f(&no_trans, &no_trans, &dim, &dim, &dim, &one, A, &dim, B, &dim, &zero, Return, &dim);
gettimeofday(&end, NULL);
dTimes[j] = CalcTime(start, end);

其中 j 。我计算的时间通过

where j is a loop running 20 times. I calculate the time passed with

double CalcTime(timeval start, timeval end)
{
double factor = 1000000;
return (((double)end.tv_sec) * factor + ((double)end.tv_usec) - (((double)start.tv_sec) * factor + ((double)start.tv_usec))) / factor;
}



结果



结果如下图所示:

Results

The result is shown in the plot below:


  1. 您认为我的做法是公平的,或者有一些不必要的开销,我可以避免?

  2. 你会期望结果会显示这么巨大的差异之间的c ++和python方法?

  3. 因为我宁愿在我的程序中使用python,在调用BLAS或LAPACK例程时,我该怎么做才能提高性能?



下载



完整的基准测试可以下载此处。 (JF Sebastian使该链接可能^^)

Download

The complete benchmark can be downloaded here. (J.F. Sebastian made that link possible^^)

推荐答案

我运行了您的基准。在我的机器上C ++和numpy之间没有区别:

I've run your benchmark. There is no difference between C++ and numpy on my machine:


你认为我的方法是公平的,还是有一些不必要的开销? b $ b

Do you think my approach is fair, or are there some unnecessary overheads I can avoid?

由于结果没有差异,它似乎是公平的。

It seems fair due to there is no difference in results.


你会期望结果会在c ++和python方法之间显示如此巨大的差异吗?两者都使用共享对象进行计算。

Would you expect that the result would show such a huge discrepancy between the c++ and python approach? Both are using shared objects for their calculations.

否。


因为我宁愿在我的程序中使用python,在调用BLAS或LAPACK例程时,我该怎么做才能提高性能?

Since I would rather use python for my program, what could I do to increase the performance when calling BLAS or LAPACK routines?

确保numpy在您的系统上使用BLAS / LAPACK库的优化版本。

Make sure that numpy uses optimized version of BLAS/LAPACK libraries on your system.

这篇关于基准测试(使用BLAS的python vs. c ++)和(numpy)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆