基准测试（使用BLAS的python vs. c ++）和（numpy） [英] Benchmarking (python vs. c++ using BLAS) and (numpy)

查看：213 发布时间：2016/10/13 10:51:00 c++ python numpy benchmarking blas

本文介绍了基准测试（使用BLAS的python vs. c ++）和（numpy）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想编写一个广泛使用BLAS和LAPACK线性代数功能的程序。因为表演是一个问题，我做了一些基准测试，想知道，如果我采取的方法是合法的。

I would like to write a program that makes extensive use of BLAS and LAPACK linear algebra functionalities. Since performance is an issue I did some benchmarking and would like know, if the approach I took is legitimate.

我可以说，三个参赛者，它们的性能具有简单的矩阵矩阵乘法。参赛者是：

I have, so to speak, three contestants and want to test their performance with a simple matrix-matrix multiplication. The contestants are:

Numpy，仅使用 dot 的功能。

Python通过共享对象调用BLAS功能。

C ++，通过共享对象调用BLAS功能。

Numpy, making use only of the functionality of dot.
Python, calling the BLAS functionalities through a shared object.
C++, calling the BLAS functionalities through a shared object.

场景

我为不同尺寸实现了矩阵矩阵乘法 i 。 i 运行从5到500，增量为5，基数 m1 和 m2 设置如下：




Scenario

I implemented a matrix-matrix multiplication for different dimensions i. i runs from 5 to 500 with an increment of 5 and the matricies m1 and m2 are set up like this:
m1 = numpy.random.rand(i,i).astype(numpy.float32)
m2 = numpy.random.rand(i,i).astype(numpy.float32)

 
 
 
 1。 Numpy 
 
 
 使用的代码如下：


1. Numpy

The code used looks like this:
tNumpy = timeit.Timer("numpy.dot(m1, m2)", "import numpy; from __main__ import m1, m2")
rNumpy.append((i, tNumpy.repeat(20, 1)))

 
 
 
 2。 Python通过共享对象调用BLAS 
 
 
 使用函数


2. Python, calling BLAS through a shared object

With the function
_blaslib = ctypes.cdll.LoadLibrary("libblas.so")
def Mul(m1, m2, i, r):

    no_trans = c_char("n")
    n = c_int(i)
    one = c_float(1.0)
    zero = c_float(0.0)

    _blaslib.sgemm_(byref(no_trans), byref(no_trans), byref(n), byref(n), byref(n), 
            byref(one), m1.ctypes.data_as(ctypes.c_void_p), byref(n), 
            m2.ctypes.data_as(ctypes.c_void_p), byref(n), byref(zero), 
            r.ctypes.data_as(ctypes.c_void_p), byref(n))

测试代码如下所示：
r = numpy.zeros((i,i), numpy.float32)
tBlas = timeit.Timer("Mul(m1, m2, i, r)", "import numpy; from __main__ import i, m1, m2, r, Mul")
rBlas.append((i, tBlas.repeat(20, 1)))

 
 
 
 3 。 c ++，通过共享对象调用BLAS 
 
 
 现在，c ++代码自然就有点长了，所以我将信息减少到最小。
 
 I load该函数带有


3. c++, calling BLAS through a shared object

Now the c++ code naturally is a little longer so I reduce the information to a minimum.

I load the function with
void* handle = dlopen("libblas.so", RTLD_LAZY);
void* Func = dlsym(handle, "sgemm_");

我用 gettimeofday 测量时间：
gettimeofday(&start, NULL);
f(&no_trans, &no_trans, &dim, &dim, &dim, &one, A, &dim, B, &dim, &zero, Return, &dim);
gettimeofday(&end, NULL);
dTimes[j] = CalcTime(start, end);

其中 j  。我计算的时间通过
where j is a loop running 20 times. I calculate the time passed with
double CalcTime(timeval start, timeval end)
{
double factor = 1000000;
return (((double)end.tv_sec) * factor + ((double)end.tv_usec) - (((double)start.tv_sec) * factor + ((double)start.tv_usec))) / factor;
}

 
 
 
结果
 
 
 结果如下图所示：


Results

The result is shown in the plot below:  
  
 
 您认为我的做法是公平的，或者有一些不必要的开销，我可以避免？
 
 你会期望结果会显示这么巨大的差异之间的c ++和python方法？ 
 
 因为我宁愿在我的程序中使用python，在调用BLAS或LAPACK例程时，我该怎么做才能提高性能？
 
 
 
 
 
下载
 
 
 完整的基准测试可以下载此处。 （JF Sebastian使该链接可能^^）


Download

The complete benchmark can be downloaded here. (J.F. Sebastian made that link possible^^)
推荐答案
我运行了您的基准。在我的机器上C ++和numpy之间没有区别：
I've run your benchmark. There is no difference between C++ and numpy on my machine:
  
 
 你认为我的方法是公平的，还是有一些不必要的开销？ b $ b 

  Do you think my approach is fair, or are there some unnecessary overheads I can avoid?
由于结果没有差异，它似乎是公平的。
It seems fair due to there is no difference in results.
 
 你会期望结果会在c ++和python方法之间显示如此巨大的差异吗？两者都使用共享对象进行计算。

  Would you expect that the result would show such a huge discrepancy between the c++ and python approach? Both are using shared objects for their calculations.
否。
 
 因为我宁愿在我的程序中使用python，在调用BLAS或LAPACK例程时，我该怎么做才能提高性能？

  Since I would rather use python for my program, what could I do to increase the performance when calling BLAS or LAPACK routines?
确保numpy在您的系统上使用BLAS / LAPACK库的优化版本。
Make sure that numpy uses optimized version of BLAS/LAPACK libraries on your system.

                        这篇关于基准测试（使用BLAS的python vs. c ++）和（numpy）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

基准测试（使用BLAS的python vs. c ++）和（numpy） [英] Benchmarking (python vs. c++ using BLAS) and (numpy)

问题描述

场景

Scenario

1。 Numpy

1. Numpy

2。 Python通过共享对象调用BLAS

2. Python, calling BLAS through a shared object

3 。 c ++，通过共享对象调用BLAS

3. c++, calling BLAS through a shared object

结果

Results

下载

Download

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

基准测试（使用BLAS的python vs. c ++）和（numpy） [英] Benchmarking (python vs. c++ using BLAS) and (numpy)

问题描述

场景

Scenario

1。 Numpy

1. Numpy

2。 Python通过共享对象调用BLAS

2. Python, calling BLAS through a shared object

3 。 c ++，通过共享对象调用BLAS

3. c++, calling BLAS through a shared object

结果

Results

下载

Download

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭