为什么MATLAB在矩阵乘法中如此之快? [英] Why is MATLAB so fast in matrix multiplication?
问题描述
我正在使用CUDA,C ++,C#和Java进行一些基准测试,并使用MATLAB进行验证和矩阵生成.但是,当我与MATLAB相乘时,2048x2048
甚至更大的矩阵几乎都立即相乘.
I am making some benchmarks with CUDA, C++, C#, and Java, and using MATLAB for verification and matrix generation. But when I multiply with MATLAB, 2048x2048
and even bigger matrices are almost instantly multiplied.
1024x1024 2048x2048 4096x4096
--------- --------- ---------
CUDA C (ms) 43.11 391.05 3407.99
C++ (ms) 6137.10 64369.29 551390.93
C# (ms) 10509.00 300684.00 2527250.00
Java (ms) 9149.90 92562.28 838357.94
MATLAB (ms) 75.01 423.10 3133.90
只有CUDA具有竞争力,但是我认为至少C ++会比较接近,并且不会60x
慢.
Only CUDA is competitive, but I thought that at least C++ will be somewhat close and not 60x
slower.
所以我的问题是-MATLAB如何这么快地做到这一点?
So my question is - How is MATLAB doing it that fast?
C ++代码:
float temp = 0;
timer.start();
for(int j = 0; j < rozmer; j++)
{
for (int k = 0; k < rozmer; k++)
{
temp = 0;
for (int m = 0; m < rozmer; m++)
{
temp = temp + matice1[j][m] * matice2[m][k];
}
matice3[j][k] = temp;
}
}
timer.stop();
我也不知道该如何看待C#结果.该算法与C ++和Java相同,但是2048
与1024
有很大的不同?
I also dont know what to think about the C# results. The algorithm is just the same as C++ and Java, but there's a giant jump 2048
from 1024
?
Edit2:
更新了MATLAB和4096x4096
结果
Updated MATLAB and 4096x4096
results
推荐答案
Here's my results using MATLAB R2011a + Parallel Computing Toolbox on a machine with a Tesla C2070:
>> A = rand(1024); gA = gpuArray(A);
% warm up by executing the operations a couple of times, and then:
>> tic, C = A * A; toc
Elapsed time is 0.075396 seconds.
>> tic, gC = gA * gA; toc
Elapsed time is 0.008621 seconds.
MATLAB使用高度优化的库进行矩阵乘法,这就是为什么普通MATLAB矩阵乘法如此之快的原因. gpuArray
版本使用 MAGMA .
MATLAB uses highly optimized libraries for matrix multiplication which is why the plain MATLAB matrix multiplication is so fast. The gpuArray
version uses MAGMA.
在具有Tesla K20c以及新的timeit
和gputimeit
功能的计算机上使用R2014a 更新:
Update using R2014a on a machine with a Tesla K20c, and the new timeit
and gputimeit
functions:
>> A = rand(1024); gA = gpuArray(A);
>> timeit(@()A*A)
ans =
0.0324
>> gputimeit(@()gA*gA)
ans =
0.0022
在具有16个物理核心和Tesla V100的WIN64计算机上使用R2018b 更新:
>> timeit(@()A*A)
ans =
0.0229
>> gputimeit(@()gA*gA)
ans =
4.8019e-04
这篇关于为什么MATLAB在矩阵乘法中如此之快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!