高效的访问矩阵列 [英] Efficient access matrix columns

查看：132 发布时间：2016/8/24 15:39:18 c++ c performance matrix

本文介绍了高效的访问矩阵列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

高效的访问问题：我需要访问一个大型矩阵（大于2000×2000）列明智的，我的算法需要1行通和1列通。行通是罚款记忆效率（高速缓存未命中），但如何减少在列高速缓存未命中通过？我需要效率。

Efficient access question: I need to access a large matrix (more than 2000x2000) column wise, my algorithm require a 1 row pass and 1 column pass. Row pass is fine for memory efficiency (cache miss), but how to reduce the cache miss in the column pass? I need efficiency.

我在我的就像是唯一：ñ声明局部变量（根据内存读取大小），

The only thing I had in my is like : declare n local variable (based on memory fetch size),

INT A1，A2，A3，A4; 对于（INT J = 0; J＆LT; DIM_Y; J + = 4）（INT I = 0; I＆LT; DIM_X;我++） A1 =矩阵[I] [J]。 ...; A4 =矩阵[I] [J + 4]; //作出的4个变量列处理。

这是C或C ++和数组或int或字符。

It's in C or C++, and array or int or char.

任何主张和意见是值得欢迎的。

Any proposition and comment is welcomed.

感谢。

推荐答案

两种基本技术适用于：

1）循环阻塞

而不是

 for (j=0;j<2000;j++)
   for (i=0;i<2000;i++) 
     process_element(i,j);

使用

for (j=0;j<2000;j+=8) 
  for (i=0;i<2000;i+=8) 
    process_block_of_8x8(i,j);

2）2排步幅的非电力（例如8192字节+ 64） - 垫在必要

2) non-power of 2 row stride (e.g. 8192 bytes + 64) -- pad if necessary

在这种情况下，行[I] ...排第[i + 7]将不会在同一高速缓存行打

in this case row[i] .. row[i+7] will not fight for the same cache line

数据应与人工计算填充连续的内存区域。

the data should be in continuous memory region with the manually calculated padding.

这篇关于高效的访问矩阵列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

高效的访问矩阵列 [英] Efficient access matrix columns

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

高效的访问矩阵列 [英] Efficient access matrix columns

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭