为什么在迭代二维数组时循环的顺序会影响性能? [英] Why does the order of the loops affect performance when iterating over a 2D array?

查看:36
本文介绍了为什么在迭代二维数组时循环的顺序会影响性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是两个几乎相同的程序,只是我交换了 ij 变量.它们都以不同的时间运行.有人可以解释为什么会发生这种情况吗?

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens?

版本 1

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (i = 0; i < 4000; i++) {
    for (j = 0; j < 4000; j++) {
      x[j][i] = i + j; }
  }
}

版本 2

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (j = 0; j < 4000; j++) {
     for (i = 0; i < 4000; i++) {
       x[j][i] = i + j; }
   }
}

推荐答案

正如其他人所说,问题是存储到数组中的内存位置:x[i][j].原因如下:

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here's a bit of insight why:

您有一个二维数组,但计算机中的内存本质上是一维的.所以当你想象你的数组是这样的:

You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this:

0,0 | 0,1 | 0,2 | 0,3
----+-----+-----+----
1,0 | 1,1 | 1,2 | 1,3
----+-----+-----+----
2,0 | 2,1 | 2,2 | 2,3

您的计算机将其作为一行存储在内存中:

Your computer stores it in memory as a single line:

0,0 | 0,1 | 0,2 | 0,3 | 1,0 | 1,1 | 1,2 | 1,3 | 2,0 | 2,1 | 2,2 | 2,3

在第二个示例中,您首先通过循环遍历第二个数字来访问数组,即:

In the 2nd example, you access the array by looping over the 2nd number first, i.e.:

x[0][0] 
        x[0][1]
                x[0][2]
                        x[0][3]
                                x[1][0] etc...

意思是你按顺序击中它们.现在看第一个版本.你在做什么:

Meaning that you're hitting them all in order. Now look at the 1st version. You're doing:

x[0][0]
                                x[1][0]
                                                                x[2][0]
        x[0][1]
                                        x[1][1] etc...

由于 C 在内存中布置二维数组的方式,您要求它到处跳跃.但现在踢球者:为什么这很重要?所有的内存访问都是一样的,对吧?

Because of the way C laid out the 2-d array in memory, you're asking it to jump all over the place. But now for the kicker: Why does this matter? All memory accesses are the same, right?

否:因为缓存.内存中的数据以小块(称为缓存行")的形式传送到 CPU,通常为 64 字节.如果您有 4 字节整数,则意味着您将在一个整洁的小包中获得 16 个连续的整数.获取这些内存块实际上相当慢;您的 CPU 可以在加载单个缓存行所需的时间内完成大量工作.

No: because of caches. Data from your memory gets brought over to the CPU in little chunks (called 'cache lines'), typically 64 bytes. If you have 4-byte integers, that means you're geting 16 consecutive integers in a neat little bundle. It's actually fairly slow to fetch these chunks of memory; your CPU can do a lot of work in the time it takes for a single cache line to load.

现在回顾一下访问顺序:第二个例子是(1)抓取一个16个整数的块,(2)修改所有的,(3)重复4000*4000/16次.这很好而且很快,而且 CPU 总是有事情要做.

Now look back at the order of accesses: The second example is (1) grabbing a chunk of 16 ints, (2) modifying all of them, (3) repeat 4000*4000/16 times. That's nice and fast, and the CPU always has something to work on.

第一个例子是(1)抓取一个 16 个整数的块,(2)只修改其中一个,(3)重复 4000*4000 次.这将需要从内存中获取"次数的 16 倍.实际上,您的 CPU 将不得不花时间坐在那里等待该内存出现,而在它闲置时您正在浪费宝贵的时间.

The first example is (1) grab a chunk of 16 ints, (2) modify only one of them, (3) repeat 4000*4000 times. That's going to require 16 times the number of "fetches" from memory. Your CPU will actually have to spend time sitting around waiting for that memory to show up, and while it's sitting around you're wasting valuable time.

重要提示:

既然您有了答案,这里有一个有趣的说明:您的第二个示例必须是快速示例并没有内在的原因.例如,在 Fortran 中,第一个示例会很快,第二个示例会很慢.这是因为 Fortran 不像 C 那样将事物扩展为概念性的行",而是扩展为列",即:

Now that you have the answer, here's an interesting note: there's no inherent reason that your second example has to be the fast one. For instance, in Fortran, the first example would be fast and the second one slow. That's because instead of expanding things out into conceptual "rows" like C does, Fortran expands into "columns", i.e.:

0,0 | 1,0 | 2,0 | 0,1 | 1,1 | 2,1 | 0,2 | 1,2 | 2,2 | 0,3 | 1,3 | 2,3

C 的布局称为行优先",Fortran 的布局称为列优先".如您所见,了解您的编程语言是行优先还是列优先非常重要!这是了解更多信息的链接:http://en.wikipedia.org/wiki/Row-major_order

The layout of C is called 'row-major' and Fortran's is called 'column-major'. As you can see, it's very important to know whether your programming language is row-major or column-major! Here's a link for more info: http://en.wikipedia.org/wiki/Row-major_order

这篇关于为什么在迭代二维数组时循环的顺序会影响性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆