如何在矩阵乘法等操作中避免“代码优化"? [英] How to avoid the 'code optimization' in operations like matrix multiplication?

查看：88 发布时间：2020/5/6 14:22:21 matlab

本文介绍了如何在矩阵乘法等操作中避免“代码优化"?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要比较两种算法Alg_x，Alg_y的运行时间.但是，Alg_x包含许多矩阵乘法，Alg_y包含许多逐元素运算(例如，每两个数字/向量的求和和乘法).从理论上讲，Alg_x和Alg_y具有相同的运行时间.但是，实际上，Alg_x的运行速度比Alg_y快得多，因为矩阵乘法是在Matlab中专门设计和优化的.

I need to compare the running time of two algorithms Alg_x, Alg_y. However, Alg_x contains many matrix multiplications, Alg_y contains many element-wise operations (e.g., summation and muliplicaton of each two numbers/vectors). Theoretically, Alg_x and Alg_y have the same running time. However, practically, Alg_x can run much faster than Alg_y, because matrix multiplications have been specially designed and optimized in Matlab.

然后，我的问题是，如何关闭此类代码优化"以公平地比较运行时间并反映理论时间复杂度?

Then, my problem is, how to close such 'code optimization' in order to fairly compare the running time and reflect the theoretical time complexity?

%%%%%  X = randn(1000,2000);

Alg_x

tic;
temp = X*X';
toc

Alg_y

[d,n] = size(X);
temp = zeros(d,d);
tic;
for i =1:n
    x = X(:,i);
    temp = temp+x*x';
end
toc

以上两个代码具有相同的输出，而Alg_x运行更快.而且，在删除 x = X(:，i);之后，Alg_y也将运行得更快. temp = temp + x * x'; ，所以我想正是迭代使得Alg_y运行缓慢.

The above two codes have the same output, while Alg_x runs much faster. Moreover, Alg_y will also run much faster after I remove x = X(:,i); temp = temp+x*x';, so I guess it is the for iteration that makes Alg_y run slow.

我确实想关闭并避免这种优化. 以下是我从中提取的内容:为什么是MATLAB矩阵乘法这么快?

I do want to close and avoid such oprimizations. Below is something I extracted from Why is MATLAB so fast in matrix multiplication?

我正在使用CUDA，C ++，C#和Java进行一些基准测试，并使用MATLAB进行验证和矩阵生成.但是当我使用MATLAB进行乘法运算时，几乎立即将2048x2048甚至更大的矩阵相乘.

I am making some benchmarks with CUDA, C++, C#, and Java, and using MATLAB for verification and matrix generation. But when I multiply with MATLAB, 2048x2048 and even bigger matrices are almost instantly multiplied.

             1024x1024   2048x2048   4096x4096
             ---------   ---------   ---------
CUDA C (ms)      43.11      391.05     3407.99
C++ (ms)       6137.10    64369.29   551390.93
C# (ms)       10509.00   300684.00  2527250.00
Java (ms)      9149.90    92562.28   838357.94
MATLAB (ms)      75.01      423.10     3133.90

只有CUDA具有竞争力，但我认为至少C ++会比较接近，并且速度不会慢60倍.

Only CUDA is competitive, but I thought that at least C++ will be somewhat close and not 60x slower.

所以我的问题是-MATLAB如何这么快地做到这一点?

So my question is - How is MATLAB doing it that fast?

C ++代码:

float temp = 0;
timer.start();
for(int j = 0; j < rozmer; j++)
{
    for (int k = 0; k < rozmer; k++)
    {
        temp = 0;
        for (int m = 0; m < rozmer; m++)
        {
            temp = temp + matice1[j][m] * matice2[m][k];
        }
        matice3[j][k] = temp;
    }
}
timer.stop();

我也不知道该如何看待C#结果.该算法与C ++和Java相同，但是从1024跳到2048?

I also dont know what to think about the C# results. The algorithm is just the same as C++ and Java, but there's a giant jump 2048 from 1024?

Edit2: 更新了MATLAB和4096x4096结果

Updated MATLAB and 4096x4096 results

如何在矩阵乘法等操作中避免“代码优化"? [英] How to avoid the 'code optimization' in operations like matrix multiplication?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在矩阵乘法等操作中避免“代码优化"? [英] How to avoid the &#39;code optimization&#39; in operations like matrix multiplication?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何在矩阵乘法等操作中避免“代码优化"? [英] How to avoid the 'code optimization' in operations like matrix multiplication?

登录关闭