按行与按列访问矩阵的元素 [英] Accessing elements of a matrix row-wise versus column-wise

查看:25
本文介绍了按行与按列访问矩阵的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个矩阵A[i][j].如果我们要添加矩阵的元素,哪种方法更好,为什么?

A matrix A[i][j] is given. If we want to add the elements of the matrix, which method is better and why?

  1. 列明智
  2. 行明智

从我的角度来看,行明智更好,因为在数组表示中,元素存储在连续的内存位置,因此访问它们所需的时间更少.但由于在 RAM 中获取每个位置需要相同的时间,这有关系吗?

From my point of view, row wise is better because in array representation elements are stored in contiguous memory locations so accessing them takes less time.But since in RAM fetching every location takes equal time, does it matter?

推荐答案

利用 空间局部性

在 C 中,矩阵以 row-major order 存储.因此,如果您访问元素 a[i][j],对元素 a[i][j+1] 的访问很可能会命中缓存.不会执行对主存储器的访问.缓存比主内存快,所以访问模式很重要.

In C, matrixes are stored in row-major order. So if you access element a[i][j], an access to element a[i][j+1] will be likely to hit the cache. No access to main memory will be performed. Caches are faster than main memory, so the access pattern does matter.

当然还要考虑更多的因素,比如写访问/读访问、写策略(直写、回写/写分配、无写分配)、多级缓存等等.但这对于这个问题来说似乎有点矫枉过正.

Of course, more factors must be taken into accout, such as write access / read access, write policy (write through, write back / write allocate , no write allocate), multilevel caches, and so on. But that seems an overkill for this question.

使用分析工具获得一些乐趣,例如 cachegrind,并查看它自己.

Have some fun with a profiling tool, such as cachegrind, and see it for yourself.

例如,考虑一个访问 4MB 矩阵的虚拟程序.查看每种访问模式的未命中率之间的差异.

For example, consider a dummy program accesing 4MB matrices. Check out the differences between the miss rates on each access pattern.

列访问

$ cat col_major.c 
#include <stdio.h>

int main(){

    size_t i,j;

    const size_t dim = 1024 ;

    int matrix [dim][dim];

    for (i=0;i< dim; i++){
        for (j=0;j <dim;j++){
            matrix[j][i]= i*j;
        }
    }
    return 0;
}

$ valgrind --tool=cachegrind ./col_major 

==3228== D   refs:      10,548,665  (9,482,149 rd   + 1,066,516 wr)
==3228== D1  misses:     1,049,704  (      968 rd   + 1,048,736 wr)
==3228== L2d misses:     1,049,623  (      893 rd   + 1,048,730 wr)
==3228== D1  miss rate:        9.9% (      0.0%     +      98.3%  )
==3228== L2d miss rate:        9.9% (      0.0%     +      98.3%  )

行访问

$ cat row_major.c 
#include <stdio.h>

int main(){
    size_t i,j;
    const size_t dim = 1024 ;
    int matrix [dim][dim];

    for (i=0;i< dim; i++)
        for (j=0;j <dim;j++)
            matrix[i][j]= i*j;

    return 0;
}

$ valgrind --tool=cachegrind ./row_major 

==3524== D   refs:      10,548,665  (9,482,149 rd   + 1,066,516 wr)
==3524== D1  misses:        66,664  (      968 rd   +    65,696 wr)
==3524== L2d misses:        66,583  (      893 rd   +    65,690 wr)
==3524== D1  miss rate:        0.6% (      0.0%     +       6.1%  )
==3524== L2d miss rate:        0.6% (      0.0%     +       6.1%  )

这篇关于按行与按列访问矩阵的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆