静态数组与动态数组的C/C ++性能 [英] C/C++ performance of static arrays vs dynamic arrays

查看:80
本文介绍了静态数组与动态数组的C/C ++性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当性能对于应用程序至关重要时,是否应考虑是否在堆栈vs堆上声明一个数组?请允许我概述为什么想到这个问题.

When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has come to mind.

由于C/C ++中的数组不是对象,并且不是指针,因此编译器使用提供的索引执行指针算术来访问元素.我的理解是,经过第一个维度后,此过程从静态声明的数组到动态声明的数组有所不同.

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension.

如果我要按照以下说明在堆栈上声明一个数组;

If I were to declare an array on the stack as follows;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

此数组将以 Row Major 格式存储在内存中,因为它存储在连续的内存块中.这意味着当我尝试访问数组中的元素时,编译器必须执行一些加法和乘法运算才能确定正确的位置.

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location.

因此,如果我要执行以下操作

So if I were to do the following

  int x = array[1][2]; // x = 5

然后,编译器将在以下位置使用此公式:

The compiler would then use this formula where:

i =行索引j =列索引n =单行大小(此处n = 2)
数组=指向第一个元素的指针

i = row index j = column index n = size of a single row (here n = 2)
array = pointer to first element

  *(array + (i*n) + j)
  *(array + (1*2) + 2)  

这意味着如果我要遍历此数组以访问其每个元素,则对每个按索引进行的访问都要执行一个附加的乘法步骤.

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index.

现在,在堆上声明的数组中,范例有所不同,需要多阶段解决方案.注意:我也可以在这里使用C ++ new运算符,但是我相信数据的表示方式没有什么区别.

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented.

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

由于数组现在是动态的,因此其表示形式是一维数组或一维数组.我将尝试绘制一张ascii图片...

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. I will try to draw an ascii picture...

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

这意味着不再涉及乘法,对吗?如果我要执行以下操作

This would imply that multiplication is no longer involved right? If I were to do the following

int x = array[1][1];

然后这将对array [1]执行间接/指针算术,以访问指向第二行的指针,然后再次执行此操作以访问第二个元素.我说的对吗?

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. Am I correct in saying this?

现在有一些上下文,回到问题所在.如果我正在为需要出色性能的应用程序编写代码,例如游戏大约需要0.016秒才能渲染一帧,我是否应该三思而后行在堆栈上使用数组还是在堆上使用数组?现在我意识到使用malloc或new运算符会花费一次时间,但是在某个时候(就像Big O分析一样),当数据集变大时,最好通过动态数组进行迭代以避免行大索引吗?

Now that there is some context, back to the question. If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing?

推荐答案

这些将适用于普通" C(不是C ++).

These will apply to "plain" C (not C++).

首先让我们清除一些术语

静态"是C语言中的关键字,如果将其应用于函数中声明的变量,它将大大改变变量的分配/访问方式.

"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.

可以在3个地方(相对于C)放置变量(包括数组):

There are 3 places (regarding C) where a variable (including arrays) may sit:

  • 堆栈:这些是不带static的函数局部变量.
  • 数据部分:程序启动时会为这些空间分配空间.这些是任何全局变量(是否为static,是否与可见性相关的关键字),以及任何声明为static的函数局部变量.
  • 堆:由指针引用的动态分配的内存(malloc()free()).您只能通过指针访问此数据.
  • Stack: these are function local variables without static.
  • Data section: space is allocated for these when the program starts. These are any global variables (be it static or not, there the keyword relates to visibility), and any function local variables declared static.
  • Heap: dynamically allocated memory (malloc() & free()) referred by a pointer. You access this data only through pointers.

现在让我们看看如何访问一维数组

如果访问具有恒定索引的数组(在纯C中可能为#define d,但不是const),则编译器可以计算该索引.如果数据部分中有一个真数组,则将直接访问该数组.如果您有一个指针( Heap )或 Stack 上的一个数组,则总是有必要进行间接寻址.因此,使用这种访​​问方式的 Data部分中的数组可能会快一点.但这不是改变世界的有用的东西.

If you access an array with a constant index (may be #defined, but not const in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.

如果使用索引变量访问数组,则由于索引可能会更改(例如,for循环中的增量),因此它实际上总是衰减为指针.对于这里的所有类型,生成的代码可能非常相似甚至完全相同.

If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.

引入更多维度

如果您声明一个二维或二维数组,并通过常量部分或全部访问它,那么智能编译器可能会像上面那样优化这些常量.

If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.

如果按索引访问,请注意存储器是线性的.如果真数组的后续维数不是2的倍数,则编译器将需要生成乘法.例如,在数组int arr[4][12];中,第二维为12.如果现在以arr[i][j]的形式访问它,其中ij是索引变量,则线性存储器必须索引为12 * i + j.因此,编译器必须在此处生成要与常数相乘的代码.复杂度取决于常数距2的幂有多远.在这里,生成的代码可能看起来有点像计算(i<<3) + (i<<2) + j来访问数组中的元素.

If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12]; the second dimension is 12. If you now access it as arr[i][j] where i and j are index variables, the linear memory has to be indexed as 12 * i + j. So the compiler has to generate code to multiply with a constant here. The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j to access the element in the array.

如果从指针构建二维数组",则尺寸的大小无关紧要,因为结构中存在引用指针.在这里,如果您可以编写arr[i][j],则意味着您将其声明为例如int* arr[4],然后malloc()将其中12个int的四个内存块分别放入其中.请注意,您的四个指针(编译器现在可以将其用作基础)也会消耗内存,如果它是一个真正的数组,则不会占用该内存.还要注意,这里生成的代码将包含双重间接寻址:首先,该代码通过arri加载指针,然后通过j的该指针加载int.

If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j], that implies you declared it as for example int* arr[4], and then malloc()ed four chunks of memory of 12 ints each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i from arr, then it will load an int from that pointer by j.

如果长度是2的幂的远"(因此必须生成复杂的乘以常数"代码才能访问元素),则使用指针可能会生成更快的访问代码.

If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes.

正如 James Kanze 在他的回答中提到的那样,在某些情况下,编译器可能能够优化访问以实现真正的访问.多维数组.对于由指针组成的数组来说,这种优化是不可能的,因为数组"实际上并不是那种情况下的线性内存块.

As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case.

位置很重要

如果您要为常规的台式机/移动体系结构(Intel/ARM 32/64位处理器)开发,本地性也很重要.那就是可能位于缓存中的内容.如果您的变量由于某种原因已经在缓存中,则可以更快地访问它们.

If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.

在本地性方面, Stack 始终是赢家,因为 Stack 的使用频率很高,因此很可能始终位于缓存中.因此,最好将小型阵列放入其中.

In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.

使用真正的多维数组而不是从指针组成一个多维数组也可能对此有所帮助,因为真正的数组始终是线性的内存块,因此通常可能需要较少的缓存块才能加载.分散的指针组成相反(如果使用单独的malloc()块),则可能需要更多的缓存块,并且可能会增加缓存行冲突,具体取决于这些块在物理上如何堆积在堆上.

Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.

这篇关于静态数组与动态数组的C/C ++性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆