正确分配多维数组 [英] Correctly allocating multi-dimensional arrays

查看:26
本文介绍了正确分配多维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题的目的是提供一个关于如何在 C 中正确动态分配多维数组的参考.这是一个即使在一些 C 编程书籍中也经常被误解和解释不清的主题.因此,即使是经验丰富的 C 程序员也很难做到正确.

我的编程老师/书籍/教程告诉我,动态分配多维数组的正确方法是使用指针到指针.

I have been taught from my programming teacher/book/tutorial that the correct way to dynamically allocate a multi-dimensional array is by using pointer-to-pointers.

但是,SO 上的几个高代表用户现在告诉我这是错误的和不好的做法.他们说指向指针的指针不是数组,我实际上并没有分配数组,而且我的代码不必要地缓慢.

However, several high rep users on SO now tell me that this is wrong and bad practice. They say that pointer-to-pointers are not arrays, that I am not actually allocating arrays and that my code is needlessly slow.

这是教我如何分配多维数组:

This is how I was taught to allocate multi-dimensional arrays:

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

int** arr_alloc (size_t x, size_t y)
{
  int** pp = malloc(sizeof(*pp) * x);
  assert(pp != NULL);
  for(size_t i=0; i<x; i++)
  {
    pp[i] = malloc(sizeof(**pp) * y);
    assert(pp[i] != NULL);
  }

  return pp;
}

int** arr_fill (int** pp, size_t x, size_t y)
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      pp[i][j] = (int)j + 1;
    }
  }

  return pp;
}

void arr_print (int** pp, size_t x, size_t y)
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      printf("%d ", pp[i][j]);
    }
    printf("\n");
  }
}

void arr_free (int** pp, size_t x, size_t y)
{
  (void) y;

  for(size_t i=0; i<x; i++)
  {
    free(pp[i]);
    pp[i] = NULL;
  }
  free(pp);
  pp = NULL;
}


int main (void)
{
  size_t x = 2;
  size_t y = 3;
  int** pp;

  pp = arr_alloc(x, y);
  pp = arr_fill(pp, x, y);
  arr_print(pp, x, y);
  arr_free(pp, x, y);

  return 0;
}

输出

1 2 3
1 2 3

这段代码工作得很好!怎么会出错?

This code works just fine! How could it be wrong?

推荐答案

要回答这个问题,首先要理清一些概念.什么是数组以及如何使用它?如果不是数组,问题中的代码是什么?

In order to answer the question, we should first clear up some concepts. What is an array and how can it be used? And what is the code in the question, if not an array?

什么是数组?

数组的正式定义可在 C 标准 ISO 9899:2011 6.2.5/20 类型中找到.

The formal definition of an array is found in the C standard, ISO 9899:2011 6.2.5/20 Types.

一个数组类型描述了一个连续分配的非空集合具有特定成员对象类型的对象,称为元素类型.

An array type describes a contiguously allocated non-empty set of objects with a particular member object type, called the element type.

简单来说,数组是相邻内存单元中连续分配的相同类型项目的集合.

In plain English, an array is a collection of items of the same type allocated contiguously, in adjacent memory cells.

例如一个由 3 个整数组成的数组 int arr[3] = {1,2,3}; 会像这样在内存中分配:

For example an array of 3 integers int arr[3] = {1,2,3}; would be allocated in memory like this:

+-------+-------+-------+
|       |       |       |
|   1   |   2   |   3   |
|       |       |       |
+-------+-------+-------+

那么多维数组的正式定义呢?实际上,它与上面引用的定义完全相同.它递归地应用.

So what about the formal definition of a multi-dimensional array? Actually, it is the very same definition as cited above. It applies recursively.

如果我们要分配一个二维数组,int arr[2][3] = { {1,2,3}, {1,2,3} };这样的记忆:

If we would allocate a 2D array, int arr[2][3] = { {1,2,3}, {1,2,3} }; it would get allocated in memory like this:

+-------+-------+-------+-------+-------+-------+
|       |       |       |       |       |       |
|   1   |   2   |   3   |   1   |   2   |   3   |
|       |       |       |       |       |       |
+-------+-------+-------+-------+-------+-------+

我们在这个例子中实际上是一个数组数组.一个包含 2 个项目的数组,每个项目都是一个包含 3 个整数的数组.

What we have in this example is actually an array of arrays. An array which has 2 items, each of them an array of 3 integers.

数组和其他类型一样

C 中的数组通常遵循与常规变量相同的类型系统.如上所示,您可以拥有一个数组数组,就像您可以拥有任何其他类型的数组一样.

Arrays in C often follow the same type system as regular variables. As shown above, you can have an array of arrays, like you can have an array of any other type.

您还可以对 n 维数组应用与普通一维数组相同的指针算法.对于常规的一维数组,应用指针算术应该是微不足道的:

You can also apply the same kind of pointer arithmetic on n-dimensional arrays as on plain one-dimensional arrays. With a regular one-dimensional arrays, applying pointer arithmetic should be trivial:

int arr[3] = {1,2,3};
int* ptr = arr; // integer pointer to the first element.

for(size_t i=0; i<3; i++)
{
  printf("%d ", *ptr); // print contents.
  ptr++; // set pointer to point at the next element.
}

这是通过数组衰减"实现的.当在表达式中使用 arr 时,它衰减"为指向第一个元素的指针.

This was made possible through "array decay". When arr was used inside an expression, it "decayed" into a pointer to the first element.

类似地,我们可以使用完全相同的指针算法,通过使用数组指针来遍历数组数组:

Similarly, we can use the very same kind of pointer arithmetic to iterate through an array of arrays, by using an array pointer:

int arr[2][3] = { {1,2,3}, {1,2,3} };
int (*ptr)[3] = arr; // int array pointer to the first element, which is an int[3] array.

for(size_t i=0; i<2; i++)
{
  printf("%d %d %d\n", (*ptr)[0], (*ptr)[1], (*ptr)[2]); // print contents
  ptr++; // set pointer to point at the next element
}

再次出现数组衰减.类型为 int [2][3] 的变量 arr 衰减为指向第一个元素的指针.第一个元素是一个 int [3],指向这样一个元素的指针被声明为 int(*)[3] - 一个数组指针.

Again there was array decay. The variable arr which was of type int [2][3] decayed into a pointer to the first element. The first element was an int [3] and a pointer to such an element is declared as int(*)[3] - an array pointer.

了解数组指针和数组衰减对于处理多维数组是必要的.

Understanding array pointers and array decay is necessary in order to work with multi-dimensional arrays.

在更多情况下,数组的行为就像常规变量一样.sizeof 运算符对于(非 VLA)数组和常规变量的工作方式相同.32 位系统示例:

There are more cases where arrays behave just like regular variables. The sizeof operator works just the same for (non-VLA) arrays as for regular variables. Examples for a 32 bit system:

int x;printf("%zu", sizeof(x)); 打印 4.
int arr[3] = {1,2,3};printf("%zu", sizeof(arr)); 打印 12 (3*4=12)
int arr[2][3] = { {1,2,3}, {1,2,3} };printf("%zu", sizeof(arr)); 打印 24 (2*3*4=24)

int x; printf("%zu", sizeof(x)); prints 4.
int arr[3] = {1,2,3}; printf("%zu", sizeof(arr)); prints 12 (3*4=12)
int arr[2][3] = { {1,2,3}, {1,2,3} }; printf("%zu", sizeof(arr)); prints 24 (2*3*4=24)

与任何其他类型一样,数组可以与库函数和通用 API 一起使用.由于数组满足连续分配的要求,例如我们可以使用 memcpy 安全地复制它们:

Like any other type, arrays can be used with library functions and generic APIs. Since arrays fulfil the requirement of being allocated contiguously, we can for example safely copy them with memcpy:

int arr_a[3] = {1,2,3};
int arr_b[3];
memcpy(arr_b, arr_a, sizeof(arr_a));

连续分配也是其他类似的标准库函数如memsetstrcpybsearchqsort的原因> 工作.它们旨在处理连续分配的数组.所以如果你有一个多维数组,你可以用bsearchqsort高效地搜索和排序,省去你自己实现二分搜索和快速排序的麻烦和从而为每个项目重新发明轮子.

Contiguous allocation is also the reason why other similar standard library functions like memset, strcpy, bsearch and qsort work. They are designed to work on arrays allocated contiguously. So if you have a multi-dimensional array, you can efficiently search it and sort it with bsearch and qsort, saving you the fuss of implementing binary search and quick sort yourself and thereby re-inventing the wheel for every project.

数组和其他类型之间的所有上述一致性是我们想要利用的非常好的事情,特别是在进行泛型编程时.

All of the above consistencies between arrays and other types is a very good thing that we want to take advantage of, particularly when doing generic programming.

指针到指针的东西是什么,如果不是数组?

现在回到问题中的代码,它使用不同的语法和指针到指针.它没有什么神秘之处.它是一个指向类型的指针,不多不少.它不是一个数组.它不是二维数组.严格来说,不能用于指向数组,也不能用于指向二维数组.

Now to get back to the code in the question, which used a different syntax with a pointer-to-pointer. There is nothing mysterious about it. It is a pointer to pointer to type, no more no less. It is not an array. It is not a 2D array. Strictly speaking, it cannot be used to point at an array, nor can it be used to point at a 2D array.

然而,指针到指针可用于指向指针数组的第一个元素,而不是指向整个数组.这就是它在问题中的使用方式 - 作为模拟"数组指针的一种方式.在问题中,它用于指向一个由 2 个指针组成的数组.然后这两个指针中的每一个都用于指向一个由 3 个整数组成的数组.

A pointer-to-pointer can however be used to point at the first element of an array of pointers, instead of pointing at the array as whole. And that is how it is used in the question - as a way to "emulate" an array pointer. In the question, it is used to point at an array of 2 pointers. And then each of the 2 pointers is used to point at an array of 3 integers.

这被称为查找表,它是一种抽象数据类型(ADT),与普通数组的低级概念有所不同.主要区别在于查找表的分配方式:

This is known as a look-up table, which is a kind of abstract data type (ADT), which is something different from the lower level concept of plain arrays. The main difference is how the look-up table is allocated:

+------------+
|            |
| 0x12340000 |
|            |
+------------+
      |
      |
      v
+------------+     +-------+-------+-------+
|            |     |       |       |       |
| 0x22223333 |---->|   1   |   2   |   3   |
|            |     |       |       |       |
+------------+     +-------+-------+-------+
|            | 
| 0xAAAABBBB |--+
|            |  | 
+------------+  |  
                |
                |  +-------+-------+-------+
                |  |       |       |       |
                +->|   1   |   2   |   3   |
                   |       |       |       |
                   +-------+-------+-------+

本例中的 32 位地址是编造的.0x12340000 框代表指针到指针.它包含指向指针数组中第一项的地址 0x12340000.该数组中的每个指针依次包含指向整数数组中第一项的地址.

The 32 bit addresses in this example are made-up. The 0x12340000 box represents the pointer-to-pointer. It contains an address 0x12340000 to the first item in an array of pointers. Each pointer in that array in turn, contains an address pointing at the first item in an array of integers.

这就是问题开始的地方.

And here is where the problems start.

查找表版本的问题

查找表分散在整个堆内存中.它不是在相邻单元格中连续分配的内存,因为每次调用 malloc() 都会给出一个新的内存区域,不一定与其他内存区域相邻.这反过来又给我们带来了很多问题:

The look-up table is scattered all over the heap memory. It is not contiguously allocated memory in adjacent cells, because each call to malloc() gives a new memory area, not necessarily located adjacently to the others. This in turn gives us lots of problems:

  • 我们不能像预期的那样使用指针算法.虽然我们可以使用某种形式的指针运算来索引和访问查找表中的项目,但我们不能使用数组指针来这样做.

  • We can't use pointer arithmetic as expected. While we can use a form of pointer arithmetic to index and access the items in the look-up table, we can't do so using array pointers.

我们不能使用 sizeof 运算符.用于指向指针的指针,它将为我们提供指向指针的指针的大小.习惯于指向的第一项,它会给我们一个指针的大小.它们都不是数组的大小.

We can't use the sizeof operator. Used on the pointer-to-pointer, it would give us the size of a pointer-to-pointer. Used to the first item pointed at, it would give us the size of a pointer. Neither of them is the size of an array.

我们不能使用除数组类型(memcpymemsetstrcpybsearchqsort 等).所有这些函数都假设将数组作为输入,数据是连续分配的.使用我们的查找表作为参数调用它们会导致未定义的行为错误,例如程序崩溃.

We can't use standard library functions that excepts an array type (memcpy, memset, strcpy, bsearch, qsort and so on). All such functions assume to get arrays as input, with data allocated contiguously. Calling them with our look-up table as parameter would result in undefined behavior bugs, such as program crashes.

重复调用 malloc 以分配多个段导致堆碎片,进而导致 RAM 内存使用不当.

Repeated calls of malloc to allocate several segments leads to heap fragmentation, which in turn results in poor use of RAM memory.

由于内存比较分散,CPU在遍历查表时无法使用缓存.数据缓存的有效使用需要从上到下迭代的连续内存块.这意味着,按照设计,查找表的访问时间比真正的多维数组慢得多.

Since the memory is scattered, the CPU cannot utilize cache memory when iterating through the look-up table. Efficient use of the data cache requires a contiguous chunk of memory which is iterated through from top to bottom. This means that the look-up table, by design, has significantly slower access time than a real multi-dimensional array.

对于每次对 malloc() 的调用,管理堆的库代码必须计算哪里有空闲空间.类似地,对于每次对 free() 的调用,都有必须执行的开销代码.因此,为了性能,尽可能少地调用这些函数通常是可取的.

For each call to malloc(), the library code managing the heap has to calculate where there is free space. Similarly for each call to free(), there is overhead code which has to be executed. Therefore, as few calls to these functions as possible is often preferable, for the sake of performance.

查找表都是不好的吗?

正如我们所见,基于指针的查找表存在很多问题.但它们也不全是坏的,它是一个和其他工具一样的工具.它只需要用于正确的目的.如果您正在寻找应该用作数组的多维数组,那么查找表显然是错误的工具.但它们可以用于其他目的.

As we can see, there are a lot of problems with pointer-based look-up tables. But they aren't all bad, it is a tool like any other. It just has to be used for the right purpose. If you are looking for a multi-dimensional array, which should be used as an array, look-up tables are clearly the wrong tool. But they can be used for other purposes.

当您需要所有维度都具有完全可变的大小时,查找表是正确的选择.例如,在创建 C 字符串列表时,这样的容器会很方便.那么为了节省内存,采取上述的执行速度性能损失通常是合理的.

A look-up table is the right choice when you need all dimensions to have completely variable sizes, individually. Such a container can be handy when for example creating a list of C strings. It is then often justified to take the above mentioned execution speed performance loss in order to save memory.

此外,查找表的优点是您可以在运行时重新分配表的一部分,而无需重新分配整个多维数组.如果这是需要经常做的事情,查找表在执行速度方面甚至可能优于多维数组.例如,在实现链式哈希表时可以使用类似的查找表.

Also, the look-up table has the advantage that you can re-alloce parts of the table in run-time without the need to re-allocate a whole multi-dimensional array. If this is something that needs to be done frequently, the look-up table might even outperform the multi-dimensional array in terms of execution speed. For example, similar look-up tables can be used when implementing a chained hash table.

那么如何正确地动态分配多维数组?

现代 C 中最简单的形式是简单地使用可变长度数组 (VLA).int array[x][y]; 其中 xy 是在运行时给定值的变量,事先数组声明.但是,VLA 具有局部范围并且不会在整个程序持续时间内持续存在——它们具有自动存储持续时间.因此,虽然 VLA 可能方便快捷地用于临时数组,但它并不是问题中查找表的通用替代品.

The easiest form in modern C is to simply use a variable-length array (VLA). int array[x][y]; where x and y are variables given values in run-time, prior array declaration. However, VLAs have local scope and do not persist throughout the duration of the program - they have automatic storage duration. So while VLAs may be convenient and fast to use for temporary arrays, it is not an universal replacement to the look-up table in the question.

要真正动态分配多维数组,使其获得分配的存储时间,我们必须使用malloc()/calloc()/realloc().下面我举一个例子.

To truly allocate a multi-dimensional array dynamically, so that it gets allocated storage duration, we have to use malloc()/calloc()/realloc(). I'll give one example below.

在现代 C 中,您将使用指向 VLA 的数组指针.即使程序中不存在实际的 VLA,您也可以使用此类指针.使用它们而不是普通的 type*void* 的好处是增加了类型安全性.使用指向 VLA 的指针还允许您将数组维度作为参数传递给使用该数组的函数,使其变量和类型同时安全.

In modern C, you would use array pointers to a VLA. You can use such pointers even when no actual VLA is present in the program. The benefit of using them over a plain type* or a void* is increased type-safety. Using a pointer to a VLA also allows you to pass the array dimensions as parameters to the function using the array, making it both variable and type safe at once.

不幸的是,为了利用指向 VLA 的指针的好处,我们不能将该指针作为函数结果返回.所以如果我们需要返回一个指向数组的指针给调用者,它必须作为参数传递(原因在 动态内存访问仅在函数内部有效).这在 C 中是很好的实践,但会使代码有点难以阅读.它看起来像这样:

Unfortunately, in order to use the benefits of having a pointer to VLA, we can't return that pointer as a function result. So if we need to return a pointer to the array to the caller, it must be passed as a parameter (for the reasons described in Dynamic memory access only works inside function). This is fine practice in C, but makes the code a bit hard to read. It would look something like this:

void arr_alloc (size_t x, size_t y, int(**aptr)[x][y])
{
  *aptr = malloc( sizeof(int[x][y]) ); // allocate a true 2D array
  assert(*aptr != NULL);
}

虽然这种带有指向数组指针的指针的语法可能看起来有点奇怪和令人生畏,但即使我们添加更多维度,它也不会比这更复杂:

While this syntax with a pointer to an array pointer might look a bit strange and intimidating, it doesn't get more complex than this even if we add more dimensions:

void arr_alloc (size_t x, size_t y, size_t z, int(**aptr)[x][y][z])
{
  *aptr = malloc( sizeof(int[x][y][z]) ); // allocate a true 3D array
  assert(*aptr != NULL);
}

现在将该代码与向查找表版本添加多维的代码进行比较:

Now compare that code with the code for adding one more dimension to the look-up table version:

/* Bad. Don't write code like this! */
int*** arr_alloc (size_t x, size_t y, size_t z)
{
  int*** ppp = malloc(sizeof(*ppp) * x);
  assert(ppp != NULL);
  for(size_t i=0; i<x; i++)
  {
    ppp[i] = malloc(sizeof(**ppp) * y);
    assert(ppp[i] != NULL);
    for(size_t j=0; j<y; j++)
    {
      ppp[i][j] = malloc(sizeof(***ppp) * z);
      assert(ppp[i][j] != NULL);
    }
  }

  return ppp;
}

现在是一团难以理解的三星级编程".甚至不考虑 4 个维度...

Now that is one unreadble mess of "three-star programming". And lets not even consider 4 dimensions...

使用真正二维数组的版本的完整代码

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

void arr_alloc (size_t x, size_t y, int(**aptr)[x][y])
{
  *aptr = malloc( sizeof(int[x][y]) ); // allocate a true 2D array
  assert(*aptr != NULL);
}

void arr_fill (size_t x, size_t y, int array[x][y])
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      array[i][j] = (int)j + 1;
    }
  }
}

void arr_print (size_t x, size_t y, int array[x][y])
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      printf("%d ", array[i][j]);
    }
    printf("\n");
  }
}

int main (void)
{
  size_t x = 2;
  size_t y = 3;
  int (*aptr)[x][y];

  arr_alloc(x, y, &aptr);
  arr_fill(x, y, *aptr);
  arr_print(x, y, *aptr);
  free(aptr); // free the whole 2D array

  return 0;
}

这篇关于正确分配多维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆