为什么在这个C ++ for循环的执行时间有显着的差异? [英] Why is there a significant difference in this C++ for loop's execution time?
问题描述
我经历了循环,发现访问循环有重大差异。
我不明白在这两种情况下会导致这种差异的原因是什么?
第一个例子:
执行时间; 8秒
for(int kk = 0; kk< 1000; kk ++)
{
sum = 0;
for(int i = 0; i <1024; i ++)
for(int j = 0; j <1024; j ++)
{
sum + i] [j]。
}
}
第二个例子:
执行时间:23秒
for(int kk = 0; kk< 1000; kk ++)
{
sum = 0;
for(int i = 0; i <1024; i ++)
for(int j = 0; j <1024; j ++)
{
sum + j] [i]。
}
}
p>
matrix [i] [j]
到
matrix [j] [i]
?
解决方案缓存
matrix [i] [j]
j] [i] ,因为matrix [i] [j]
具有更多的连续存储器访问机会。
例如,当我们访问
matrix [i] [0]
时,缓存可以加载包含[i] [0]
,因此,访问matrix [i] [1]
,]
,...,将受益于缓存速度,因为matrix [i] [1]
,matrix [i ] [2]
,...靠近matrix [i] [0]
。
$ b $然而,当我们访问matrix [j] [0]
时,它远离matrix [j-1] [0] / code>,并且可能未被缓存,并且不能受益于缓存速度。特别是,矩阵通常作为连续的大段存储器存储,并且cacher可以断言存储器访问的行为并且总是缓存存储器。
这是为什么
matrix [i] [j]
更快。这是典型的基于CPU缓存的性能优化。I was going through loops and found a significant difference in accessing loops. I can't understand what is the thing that causes such difference in both cases?
First Example:
Execution Time; 8 seconds
for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[i][j]; } }
Second Example:
Execution Time: 23 seconds
for (int kk = 0; kk < 1000; kk++) { sum = 0; for (int i = 0; i < 1024; i++) for (int j = 0; j < 1024; j++) { sum += matrix[j][i]; } }
What causes so much execution time difference just exchanging
matrix[i][j]
to
matrix[j][i]
?
解决方案It's an issue of memory cache.
matrix[i][j]
has better cache hits thanmatrix[j][i]
, sincematrix[i][j]
has more continuous memory accessing chances.For example, when we access
matrix[i][0]
, the cache may load a continuous segment of memory containingmatrix[i][0]
, thus, accessingmatrix[i][1]
,matrix[i][2]
, ..., will benefit from caching speed, sincematrix[i][1]
,matrix[i][2]
, ... are near tomatrix[i][0]
.However, when we access
matrix[j][0]
, it is far frommatrix[j - 1][0]
and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory.That's why
matrix[i][j]
is faster. This is typical in CPU cache based performance optimizing.这篇关于为什么在这个C ++ for循环的执行时间有显着的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!