为什么遍历一个二维数组时,循环的顺序会影响性能? [英] Why does the order of the loops affect performance when iterating over a 2D array?

查看:1617
本文介绍了为什么遍历一个二维数组时,循环的顺序会影响性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  

可能重复:结果
  <一href=\"http://stackoverflow.com/questions/9888154/which-of-these-two-for-loops-is-more-efficient-in-terms-of-time-and-cache-perfor\">Which这两个for循环更有效的时间和缓存性能方面


下面是两个程序,几乎是相​​同的,只是我切换了 I Ĵ变量左右。它们都在不同的时间量运行。可能有人解释为什么出现这种情况?

1版

 的#include&LT;&stdio.h中GT;
#包括LT&;&stdlib.h中GT;主(){
  INT I,J;
  静态INT×〔4000] [4000];
  对于(i = 0; I&LT; 4000;我++){
    为(J = 0; J&LT; 4000; J ++){
      X [J]。[I] = I + J; }
  }
}

2版

 的#include&LT;&stdio.h中GT;
#包括LT&;&stdlib.h中GT;主(){
  INT I,J;
  静态INT×〔4000] [4000];
  为(J = 0; J&LT; 4000; J ++){
     对于(i = 0; I&LT; 4000;我++){
       X [J]。[I] = I + J; }
   }
}


解决方案

正如其他人所说,这个问题是店里的阵列中的存储位置: X [I] [J] 。这里是一个有点见识的原因:

您有一个2维阵列,但是存储器中的计算机本身是1维的。所以,当你想象你的阵列是这样的:

  0,0 | 0.1 | 0.2 | 0,3
---- + ----- + ----- + ----
1,0 | 1,1 | 1,2 | 1,3
---- + ----- + ----- + ----
2,0 | 2,1 | 2,2 | 2,3

您的计算机并将其存储在内存中的一行:

  0,0 | 0.1 | 0.2 | 0,3 | 1,0 | 1,1 | 1,2 | 1,3 | 2,0 | 2,1 | 2,2 | 2,3

在第二个例子中,你通过循环在第2个数字第一,即访问数组:

  X [0] [0]
        X [0] [1]
                X [0] [2]
                        X [0] [3]
                                X [1] [0],等等..

也就是说你打他们所有的秩序。现在来看第一个版本。你正在做的:

  X [0] [0]
                                X [1] [0]
                                                                X [2] [0]
        X [0] [1]
                                        X [1] [1]等...

由于道路Ç奠定了在内存中的2-D数组,你要求它所有的地方跳。但是,现在的踢球者:为什么这件事情?所有的内存访问都是一样的,对吧?

没有:因为高速缓存。从内存中的数据被带来了在小块的CPU(称为缓存行),通常为64个字节。如果你有4个字节的整数,这意味着你在一个整洁的小歌厅包连续16整数。它实际上是相当缓慢获取的记忆这些块;你的CPU可以做很多的工作,在它需要一个单一的高速缓存行加载时间。

现在回头看访问的顺序:第二个例子是:(1)抓住16整数块,(2)修改所有的人,(3)重复4000 *十六分之四千倍。这是很好,速度很快,并且CPU始终有东西去努力。

第一个例子是(1)抓住的16个整数的组块,(2)修改只是其中之一,(3)重复4000 * 4000次。那将需要从存储器取的16倍的数量。你的CPU实际上会花时间坐在那里等待内存展现出来,虽然它坐在你身边的是在浪费宝贵的时间。

重要提示:

现在,你有答案,这里是一个有趣的注意:有你的第二个例子已是快一没有内在的原因。例如,用Fortran,第一个例子将是快速,第二个慢。这是因为,而不是扩大东西出来为概念的行,如C呢,Fortran语言扩展成列,即:

  0,0 | 1,0 | 2,0 | 0.1 | 1,1 | 2,1 | 0.2 | 1,2 | 2,2 | 0,3 | 1,3 | 2,3

C的布局被称为行优先和Fortran的被称为列优先。正如你所看到的,这是非常重要的是知道你的编程语言是行优先或列重大!这里有一个链接了解更多信息:<一href=\"http://en.wikipedia.org/wiki/Row-major_order\">http://en.wikipedia.org/wiki/Row-major_order

Possible Duplicate:
Which of these two for loops is more efficient in terms of time and cache performance

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens?

Version 1

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (i = 0; i < 4000; i++) {
    for (j = 0; j < 4000; j++) {
      x[j][i] = i + j; }
  }
}

Version 2

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (j = 0; j < 4000; j++) {
     for (i = 0; i < 4000; i++) {
       x[j][i] = i + j; }
   }
}

解决方案

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here's a bit of insight why:

You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this:

0,0 | 0,1 | 0,2 | 0,3
----+-----+-----+----
1,0 | 1,1 | 1,2 | 1,3
----+-----+-----+----
2,0 | 2,1 | 2,2 | 2,3

Your computer stores it in memory as a single line:

0,0 | 0,1 | 0,2 | 0,3 | 1,0 | 1,1 | 1,2 | 1,3 | 2,0 | 2,1 | 2,2 | 2,3

In the 2nd example, you access the array by looping over the 2nd number first, i.e.:

x[0][0] 
        x[0][1]
                x[0][2]
                        x[0][3]
                                x[1][0] etc...

Meaning that you're hitting them all in order. Now look at the 1st version. You're doing:

x[0][0]
                                x[1][0]
                                                                x[2][0]
        x[0][1]
                                        x[1][1] etc...

Because of the way C laid out the 2-d array in memory, you're asking it to jump all over the place. But now for the kicker: Why does this matter? All memory accesses are the same, right?

No: because of caches. Data from your memory gets brought over to the CPU in little chunks (called 'cache lines'), typically 64 bytes. If you have 4-byte integers, that means you're geting 16 consecutive integers in a neat little bundle. It's actually fairly slow to fetch these chunks of memory; your CPU can do a lot of work in the time it takes for a single cache line to load.

Now look back at the order of accesses: The second example is (1) grabbing a chunk of 16 ints, (2) modifying all of them, (3) repeat 4000*4000/16 times. That's nice and fast, and the CPU always has something to work on.

The first example is (1) grab a chunk of 16 ints, (2) modify only one of them, (3) repeat 4000*4000 times. That's going to require 16 times the number of "fetches" from memory. Your CPU will actually have to spend time sitting around waiting for that memory to show up, and while it's sitting around you're wasting valuable time.

Important Note:

Now that you have the answer, here's an interesting note: there's no inherent reason that your second example has to be the fast one. For instance, in Fortran, the first example would be fast and the second one slow. That's because instead of expanding things out into conceptual "rows" like C does, Fortran expands into "columns", i.e.:

0,0 | 1,0 | 2,0 | 0,1 | 1,1 | 2,1 | 0,2 | 1,2 | 2,2 | 0,3 | 1,3 | 2,3

The layout of C is called 'row-major' and Fortran's is called 'column-major'. As you can see, it's very important to know whether your programming language is row-major or column-major! Here's a link for more info: http://en.wikipedia.org/wiki/Row-major_order

这篇关于为什么遍历一个二维数组时,循环的顺序会影响性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆