为什么我的Python NumPy代码比C ++快? [英] Why is my Python NumPy code faster than C++?
问题描述
为什么这是Python NumPy代码,
Why is this Python NumPy code,
import numpy as np
import time
k_max = 40000
N = 10000
data = np.zeros((2,N))
coefs = np.zeros((k_max,2),dtype=float)
t1 = time.time()
for k in xrange(1,k_max+1):
cos_k = np.cos(k*data[0,:])
sin_k = np.sin(k*data[0,:])
coefs[k-1,0] = (data[1,-1]-data[1,0]) + np.sum(data[1,:-1]*(cos_k[:-1] - cos_k[1:]))
coefs[k-1,1] = np.sum(data[1,:-1]*(sin_k[:-1] - sin_k[1:]))
t2 = time.time()
print('Time:')
print(t2-t1)
比以下C ++代码快吗?
faster than the following C++ code?
#include <cstdio>
#include <iostream>
#include <cmath>
#include <time.h>
using namespace std;
// consts
const unsigned int k_max = 40000;
const unsigned int N = 10000;
int main()
{
time_t start, stop;
double diff;
// table with data
double data1[ N ];
double data2[ N ];
// table of results
double coefs1[ k_max ];
double coefs2[ k_max ];
// main loop
time( & start );
for( unsigned int j = 1; j<N; j++ )
{
for( unsigned int i = 0; i<k_max; i++ )
{
coefs1[ i ] += data2[ j-1 ]*(cos((i+1)*data1[ j-1 ]) - cos((i+1)*data1[ j ]));
coefs2[ i ] += data2[ j-1 ]*(sin((i+1)*data1[ j-1 ]) - sin((i+1)*data1[ j ]));
}
}
// end of main loop
time( & stop );
// speed result
diff = difftime( stop, start );
cout << "Time: " << diff << " seconds";
return 0;
}
第一个显示:时间:8秒;
而第二个:时间:11秒
The first one shows: "Time: 8 seconds" while the second: "Time: 11 seconds"
我知道NumPy是用C编写的,但我仍然认为C ++示例会更快。我想念什么吗?有没有一种方法可以改善C ++代码(或Python代码)?
I know that NumPy is written in C, but I would still think that C++ example would be faster. Am I missing something? Is there a way to improve the C++ code (or the Python one)?
我已经更改了C ++代码(动态表到静态表),如注释之一所示。 C ++代码现在更快,但仍然比Python版本慢。
I have changed the C++ code (dynamical tables to static tables) as suggested in one of the comments. The C++ code is faster now, but still much slower than the Python version.
我已从调试更改为发布模式,并将 k从4000增加到40000。现在NumPy稍微快一点(从8秒增加到11秒)。
I have changed from debug to release mode and increased 'k' from 4000 to 40000. Now NumPy is just slightly faster (8 seconds to 11 seconds).
推荐答案
我发现这个问题很有趣,因为每次遇到类似NumPy速度的主题(与C / C ++相比)时,总会有类似它是一个薄包装纸,它的核心是用C编写的,所以很胖这样的答案,但是
I found this question interesting, because every time I encountered similar topic about the speed of NumPy (compared to C/C++) there was always answers like "it's a thin wrapper, its core is written in C, so it's fats", but this doesn't explain why C should be slower than C with additional layer (even a thin one).
答案是:您的C ++代码并不比您的Python代码慢。在正确编译后。
我已经做了一些基准测试,起初似乎NumPy的速度出奇地快。但是我忘了使用 GCC 。
I've done some benchmarks, and at first it seemed that NumPy is surprisingly faster. But I forgot about optimizing the compilation with GCC.
我再次计算了所有内容,并将结果与纯C版本的代码进行了比较。我正在使用GCC版本4.9.2和Python 2.7.9(从源代码使用相同的GCC编译)。我使用 g ++ -O3 main.cpp -o main
来编译您的C ++代码,我使用 gcc -O3 main.c- lm -o main
。在所有示例中,我都会在 data
变量中填充一些数字(0.1,0.4),因为它会改变结果。我还更改了 np.arrays 以使用双精度( dtype = np.float64
),因为C ++示例中有双精度。我的代码的纯C版本(相似):
I've computed everything again and also compared results with a pure C version of your code. I am using GCC version 4.9.2, and Python 2.7.9 (compiled from the source with the same GCC). To compile your C++ code I used g++ -O3 main.cpp -o main
, to compile my C code I used gcc -O3 main.c -lm -o main
. In all examples I filled data
variables with some numbers (0.1, 0.4), as it changes results. I also changed np.arrays to use doubles (dtype=np.float64
), because there are doubles in C++ example. My pure C version of your code (it's similar):
#include <math.h>
#include <stdio.h>
#include <time.h>
const int k_max = 100000;
const int N = 10000;
int main(void)
{
clock_t t_start, t_end;
double data1[N], data2[N], coefs1[k_max], coefs2[k_max], seconds;
int z;
for( z = 0; z < N; z++ )
{
data1[z] = 0.1;
data2[z] = 0.4;
}
int i, j;
t_start = clock();
for( i = 0; i < k_max; i++ )
{
for( j = 0; j < N-1; j++ )
{
coefs1[i] += data2[j] * (cos((i+1) * data1[j]) - cos((i+1) * data1[j+1]));
coefs2[i] += data2[j] * (sin((i+1) * data1[j]) - sin((i+1) * data1[j+1]));
}
}
t_end = clock();
seconds = (double)(t_end - t_start) / CLOCKS_PER_SEC;
printf("Time: %f s\n", seconds);
return coefs1[0];
}
对于 k_max = 100000,N = 10000
结果如下:
- Python 70.284362 s
- C ++ 69.133199 s
- C 61.638186 s
Python和C ++具有基本相同的时间,但是请注意,存在一个长度为k_max的Python循环,相比之下,它应该慢得多到C / C ++之一。
Python and C++ have basically the same time, but note that there is a Python loop of length k_max, which should be much slower compared to C/C++ one. And it is.
对于 k_max = 1000000,N = 1000
我们有:
- Python 115.42766 s
- C ++ 70.781380 s
对于 k_max = 1000000,N = 100
:
- Python 52.86826 s
- C ++ 7.050597 s
所以差异随分数 k_max / N
的增加而增加,但是即使 N
比 k_max
大得多。 G。 k_max = 100,N = 100000
:
So the difference increases with fraction k_max/N
, but python is not faster even for N
much bigger than k_max
, e. g. k_max = 100, N = 100000
:
- Python 0.651587 s
- C ++ 0.568518 s
很明显,C / C ++和Python之间的主要速度差异在于 for
循环。但是我想找出在NumPy和C中对数组进行简单操作之间的区别。在代码中使用NumPy的优点包括:1.将整个数组乘以一个数字,2.计算整个数组的sin / cos, 3.对数组的所有元素求和,而不是分别对每个项目执行这些操作。因此,我准备了两个脚本来仅比较这些操作。
Obviously, the main speed difference between C/C++ and Python is in the for
loop. But I wanted to find out the difference between simple operations on arrays in NumPy and in C. Advantages of using NumPy in your code consists of: 1. multiplying the whole array by a number, 2. calculating sin/cos of the whole array, 3. summing all elements of the array, instead of doing those operations on every single item separately. So I prepared two scripts to compare only these operations.
Python脚本:
import numpy as np
from time import time
N = 10000
x_len = 100000
def main():
x = np.ones(x_len, dtype=np.float64) * 1.2345
start = time()
for i in xrange(N):
y1 = np.cos(x, dtype=np.float64)
end = time()
print('cos: {} s'.format(end-start))
start = time()
for i in xrange(N):
y2 = x * 7.9463
end = time()
print('multi: {} s'.format(end-start))
start = time()
for i in xrange(N):
res = np.sum(x, dtype=np.float64)
end = time()
print('sum: {} s'.format(end-start))
return y1, y2, res
if __name__ == '__main__':
main()
# results
# cos: 22.7199969292 s
# multi: 0.841291189194 s
# sum: 1.15971088409 s
C脚本:
#include <math.h>
#include <stdio.h>
#include <time.h>
const int N = 10000;
const int x_len = 100000;
int main()
{
clock_t t_start, t_end;
double x[x_len], y1[x_len], y2[x_len], res, time;
int i, j;
for( i = 0; i < x_len; i++ )
{
x[i] = 1.2345;
}
t_start = clock();
for( j = 0; j < N; j++ )
{
for( i = 0; i < x_len; i++ )
{
y1[i] = cos(x[i]);
}
}
t_end = clock();
time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
printf("cos: %f s\n", time);
t_start = clock();
for( j = 0; j < N; j++ )
{
for( i = 0; i < x_len; i++ )
{
y2[i] = x[i] * 7.9463;
}
}
t_end = clock();
time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
printf("multi: %f s\n", time);
t_start = clock();
for( j = 0; j < N; j++ )
{
res = 0.0;
for( i = 0; i < x_len; i++ )
{
res += x[i];
}
}
t_end = clock();
time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
printf("sum: %f s\n", time);
return y1[0], y2[0], res;
}
// results
// cos: 20.910590 s
// multi: 0.633281 s
// sum: 1.153001 s
Python结果:
- cos:22.7199969292 s
- 倍:0.841291189194 s
- 总和:1.15971088409 s
C结果:
- cos:20.910590 s
- 多:0.633281 s
- 总和:1.153001 s
如您所见,NumPy的速度非常快,但始终比纯C慢。
As you can see NumPy is incredibly fast, but always a bit slower than pure C.
这篇关于为什么我的Python NumPy代码比C ++快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!