为什么MATLAB在矩阵乘法中如此之快? [英] Why is MATLAB so fast in matrix multiplication?

查看:161
本文介绍了为什么MATLAB在矩阵乘法中如此之快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CUDA,C ++,C#和Java进行一些基准测试,并使用MATLAB进行验证和矩阵生成.但是,当我与MATLAB相乘时,2048x2048甚至更大的矩阵几乎都立即相乘.

I am making some benchmarks with CUDA, C++, C#, and Java, and using MATLAB for verification and matrix generation. But when I multiply with MATLAB, 2048x2048 and even bigger matrices are almost instantly multiplied.

             1024x1024   2048x2048   4096x4096
             ---------   ---------   ---------
CUDA C (ms)      43.11      391.05     3407.99
C++ (ms)       6137.10    64369.29   551390.93
C# (ms)       10509.00   300684.00  2527250.00
Java (ms)      9149.90    92562.28   838357.94
MATLAB (ms)      75.01      423.10     3133.90

只有CUDA具有竞争力,但是我认为至少C ++会比较接近,并且不会60x慢.

Only CUDA is competitive, but I thought that at least C++ will be somewhat close and not 60x slower.

所以我的问题是-MATLAB如何这么快地做到这一点?

So my question is - How is MATLAB doing it that fast?

C ++代码:

float temp = 0;
timer.start();
for(int j = 0; j < rozmer; j++)
{
    for (int k = 0; k < rozmer; k++)
    {
        temp = 0;
        for (int m = 0; m < rozmer; m++)
        {
            temp = temp + matice1[j][m] * matice2[m][k];
        }
        matice3[j][k] = temp;
    }
}
timer.stop();

我也不知道该如何看待C#结果.该算法与C ++和Java相同,但是20481024有很大的不同?

I also dont know what to think about the C# results. The algorithm is just the same as C++ and Java, but there's a giant jump 2048 from 1024?

Edit2: 更新了MATLAB和4096x4096结果

Updated MATLAB and 4096x4096 results

推荐答案

这是使用MATLAB R2011a +

Here's my results using MATLAB R2011a + Parallel Computing Toolbox on a machine with a Tesla C2070:

>> A = rand(1024); gA = gpuArray(A);
% warm up by executing the operations a couple of times, and then:
>> tic, C = A * A; toc
Elapsed time is 0.075396 seconds.
>> tic, gC = gA * gA; toc
Elapsed time is 0.008621 seconds.

MATLAB使用高度优化的库进行矩阵乘法,这就是为什么普通MATLAB矩阵乘法如此之快的原因. gpuArray版本使用 MAGMA .

MATLAB uses highly optimized libraries for matrix multiplication which is why the plain MATLAB matrix multiplication is so fast. The gpuArray version uses MAGMA.

在具有Tesla K20c以及新的timeitgputimeit功能的计算机上使用R2014a 更新:

Update using R2014a on a machine with a Tesla K20c, and the new timeit and gputimeit functions:

>> A = rand(1024); gA = gpuArray(A);
>> timeit(@()A*A)
ans =
    0.0324
>> gputimeit(@()gA*gA)
ans =
    0.0022

在具有16个物理核心和Tesla V100的WIN64计算机上使用R2018b 更新:

>> timeit(@()A*A)
ans =
    0.0229
>> gputimeit(@()gA*gA)
ans =
   4.8019e-04

这篇关于为什么MATLAB在矩阵乘法中如此之快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆