缓存友好的方法来将两个矩阵相乘 [英] Cache friendly method to multiply two matrices

查看:127
本文介绍了缓存友好的方法来将两个矩阵相乘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算使用缓存友好的方法将2个矩阵相乘(这样会减少丢失的次数)

I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses)

我发现这可以通过缓存友好的转置函数来完成.

I found out that this can be done with a cache friendly transpose function.

但是我找不到该算法.我可以知道如何实现吗?

But I am not able to find this algorithm. Can I know how to achieve this?

推荐答案

您要查找的单词是 th撞.在Google 产生更多结果.

The word you are looking for is thrashing. Searching for thrashing matrix multiplication in Google yields more results.

用于c = a * b的标准乘法算法看起来像

A standard multiplication algorithm for c = a*b would look like

void multiply(double[,] a, double[,] b, double[,] c)
{
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            for (int k = 0; k < n; k++)
                C[i, j] += a[i, k] * b[k, j]; 
}

基本上,快速大步浏览内存会对性能产生不利影响. B [ k ,j]中 k 的访问模式正是如此.因此,我们可以重新安排这些操作,以使最内部的循环仅对矩阵的第二个访问索引进行操作,而不是在内存中跳来跳去:

Basically, navigating the memory fastly in large steps is detrimental to performance. The access pattern for k in B[k, j] is doing exactly that. So instead of jumping around in the memory, we may rearrange the operations such that the most inner loops operate only on the second access index of the matrices:

void multiply(double[,] a, double[,] B, double[,] c)
{  
   for (i = 0; i < n; i++)
   {  
      double t = a[i, 0];
      for (int j = 0; j < n; j++)
         c[i, j] = t * b[0, j];

      for (int k = 1; k < n; k++)
      {
         double s = 0;
         for (int j = 0; j < n; j++ )
            s += a[i, k] * b[k, j];
         c[i, j] = s;
      }
   }
}

这是该页面上给出的示例.但是,另一种选择是将B [k,*]的内容事先复制到一个数组中,并在内部循环计算中使用该数组.即使涉及到复制数据,这种方法通常比其他方法快得多.即使这似乎违反直觉,也请随时尝试一下.

This was the example given on that page. However, another option is to copy the contents of B[k, *] into an array beforehand and use this array in the inner loop calculations. This approach is usually much faster than the alternatives, even if it involves copying data around. Even if this might seem counter-intuitive, please feel free to try for yourself.

void multiply(double[,] a, double[,] b, double[,] c)
{
    double[] Bcolj = new double[n];
    for (int j = 0; j < n; j++)
    {
        for (int k = 0; k < n; k++)
            Bcolj[k] = b[k, j];

        for (int i = 0; i < n; i++)
        {
            double s = 0;
            for (int k = 0; k < n; k++)
                s += a[i,k] * Bcolj[k];
            c[j, i] = s;
        }
   }
}

这篇关于缓存友好的方法来将两个矩阵相乘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆