正确分配多维数组 [英] Correctly allocating multi-dimensional arrays

查看:87
本文介绍了正确分配多维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题的目的是提供有关如何在C中动态正确分配多维数组的参考.即使在某些C编程书籍中,这也是一个经常被误解且解释不清的主题.因此,即使是经验丰富的C程序员也很难做到这一点.

从我的编程老师/书/教程中得知,动态分配多维数组的正确方法是使用指针到指针.

I have been taught from my programming teacher/book/tutorial that the correct way to dynamically allocate a multi-dimensional array is by using pointer-to-pointers.

但是,现在有几个SO的高级用户告诉我这是错误的做法.他们说点对点指针不是数组,我实际上不是在分配数组,我的代码不必要地很慢.

However, several high rep users on SO now tell me that this is wrong and bad practice. They say that pointer-to-pointers are not arrays, that I am not actually allocating arrays and that my code is needlessly slow.

这是教我分配多维数组的方法:

This is how I was taught to allocate multi-dimensional arrays:

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

int** arr_alloc (size_t x, size_t y)
{
  int** pp = malloc(sizeof(*pp) * x);
  assert(pp != NULL);
  for(size_t i=0; i<x; i++)
  {
    pp[i] = malloc(sizeof(**pp) * y);
    assert(pp[i] != NULL);
  }

  return pp;
}

int** arr_fill (int** pp, size_t x, size_t y)
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      pp[i][j] = (int)j + 1;
    }
  }

  return pp;
}

void arr_print (int** pp, size_t x, size_t y)
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      printf("%d ", pp[i][j]);
    }
    printf("\n");
  }
}

void arr_free (int** pp, size_t x, size_t y)
{
  (void) y;

  for(size_t i=0; i<x; i++)
  {
    free(pp[i]);
    pp[i] = NULL;
  }
  free(pp);
  pp = NULL;
}


int main (void)
{
  size_t x = 2;
  size_t y = 3;
  int** pp;

  pp = arr_alloc(x, y);
  pp = arr_fill(pp, x, y);
  arr_print(pp, x, y);
  arr_free(pp, x, y);

  return 0;
}

输出

1 2 3
1 2 3

此代码可以正常工作!怎么可能错了?

This code works just fine! How could it be wrong?

推荐答案

为了回答这个问题,我们首先应该弄清一些概念.什么是数组,如何使用?如果不是数组,问题中的代码是什么?

In order to answer the question, we should first clear up some concepts. What is an array and how can it be used? And what is the code in the question, if not an array?

什么是数组?

在C标准 ISO 9899:2011 6.2.5/20类型中找到数组的正式定义.

The formal definition of an array is found in the C standard, ISO 9899:2011 6.2.5/20 Types.

一个数组类型描述了一个连续分配的非空集 具有特定成员对象类型(称为元素类型)的对象.

An array type describes a contiguously allocated non-empty set of objects with a particular member object type, called the element type.

用简单的英语来说,数组是相邻存储单元中连续分配的相同类型项的集合.

In plain English, an array is a collection of items of the same type allocated contiguously, in adjacent memory cells.

例如,将在内存中分配3个整数int arr[3] = {1,2,3};的数组,如下所示:

For example an array of 3 integers int arr[3] = {1,2,3}; would be allocated in memory like this:

+-------+-------+-------+
|       |       |       |
|   1   |   2   |   3   |
|       |       |       |
+-------+-------+-------+

那么多维数组的形式定义呢?实际上,它与上面引用的定义非常相同.它递归地应用.

So what about the formal definition of a multi-dimensional array? Actually, it is the very same definition as cited above. It applies recursively.

如果我们要分配2D数组,则int arr[2][3] = { {1,2,3}, {1,2,3} };它将像这样在内存中分配:

If we would allocate a 2D array, int arr[2][3] = { {1,2,3}, {1,2,3} }; it would get allocated in memory like this:

+-------+-------+-------+-------+-------+-------+
|       |       |       |       |       |       |
|   1   |   2   |   3   |   1   |   2   |   3   |
|       |       |       |       |       |       |
+-------+-------+-------+-------+-------+-------+

在此示例中,我们实际上是一个数组数组.一个有2个项目的数组,每个项目都是3个整数的数组.

What we have in this example is actually an array of arrays. An array which has 2 items, each of them an array of 3 integers.

数组是与其他数组一样的类型

C中的数组通常遵循与常规变量相同的类型系统.如上所示,您可以拥有一个数组数组,就像您可以拥有任何其他类型的数组一样.

Arrays in C often follow the same type system as regular variables. As shown above, you can have an array of arrays, like you can have an array of any other type.

您还可以在 n 维数组上应用与普通一维数组相同的指针算法.对于常规的一维数组,应用指针算法应该很简单:

You can also apply the same kind of pointer arithmetic on n-dimensional arrays as on plain one-dimensional arrays. With a regular one-dimensional arrays, applying pointer arithmetic should be trivial:

int arr[3] = {1,2,3};
int* ptr = arr; // integer pointer to the first element.

for(size_t i=0; i<3; i++)
{
  printf("%d ", *ptr); // print contents.
  ptr++; // set pointer to point at the next element.
}

这可以通过数组衰减"来实现.在表达式中使用arr时,它会衰减"为指向第一个元素的指针.

This was made possible through "array decay". When arr was used inside an expression, it "decayed" into a pointer to the first element.

类似地,我们可以通过使用 array指针

Similarly, we can use the very same kind of pointer arithmetic to iterate through an array of arrays, by using an array pointer:

int arr[2][3] = { {1,2,3}, {1,2,3} };
int (*ptr)[3] = arr; // int array pointer to the first element, which is an int[3] array.

for(size_t i=0; i<2; i++)
{
  printf("%d %d %d\n", (*ptr)[0], (*ptr)[1], (*ptr)[2]); // print contents
  ptr++; // set pointer to point at the next element
}

再次出现阵列衰减.类型为int [2][3]的变量arr衰减为指向第一个元素的指针.第一个元素是int [3],指向该元素的指针声明为int(*)[3]-数组指针.

Again there was array decay. The variable arr which was of type int [2][3] decayed into a pointer to the first element. The first element was an int [3] and a pointer to such an element is declared as int(*)[3] - an array pointer.

要使用多维数组,必须了解数组指针和数组衰减.

Understanding array pointers and array decay is necessary in order to work with multi-dimensional arrays.

在更多情况下,数组的行为就像常规变量一样. sizeof运算符对(非VLA)数组的作用与对常规变量的作用相同. 32位系统的示例:

There are more cases where arrays behave just like regular variables. The sizeof operator works just the same for (non-VLA) arrays as for regular variables. Examples for a 32 bit system:

int x; printf("%zu", sizeof(x));打印4.
int arr[3] = {1,2,3}; printf("%zu", sizeof(arr));打印12(3 * 4 = 12)
int arr[2][3] = { {1,2,3}, {1,2,3} }; printf("%zu", sizeof(arr));打印24(2 * 3 * 4 = 24)

int x; printf("%zu", sizeof(x)); prints 4.
int arr[3] = {1,2,3}; printf("%zu", sizeof(arr)); prints 12 (3*4=12)
int arr[2][3] = { {1,2,3}, {1,2,3} }; printf("%zu", sizeof(arr)); prints 24 (2*3*4=24)

与任何其他类型一样,数组可以与库函数和通用API一起使用.由于数组满足连续分配的要求,因此我们可以例如使用memcpy安全复制它们:

Like any other type, arrays can be used with library functions and generic APIs. Since arrays fulfil the requirement of being allocated contiguously, we can for example safely copy them with memcpy:

int arr_a[3] = {1,2,3};
int arr_b[3];
memcpy(arr_b, arr_a, sizeof(arr_a));

连续分配也是memsetstrcpybsearchqsort之类的其他类似标准库功能起作用的原因.它们旨在处理连续分配的阵列.因此,如果您具有多维数组,则可以有效地对其进行搜索并使用bsearchqsort对其进行排序,从而省去了执行二进制搜索和自己快速进行排序的麻烦,从而为每个项目重新发明了轮子.

Contiguous allocation is also the reason why other similar standard library functions like memset, strcpy, bsearch and qsort work. They are designed to work on arrays allocated contiguously. So if you have a multi-dimensional array, you can efficiently search it and sort it with bsearch and qsort, saving you the fuss of implementing binary search and quick sort yourself and thereby re-inventing the wheel for every project.

数组和其他类型之间的上述所有一致性是我们要利用的一件好事,尤其是在进行泛型编程时.

All of the above consistencies between arrays and other types is a very good thing that we want to take advantage of, particularly when doing generic programming.

什么是指针对指针(如果不是数组的话)?

现在返回到问题中的代码,该代码使用带有指针到指针的不同语法.没有什么神秘的.它是一个指向类型的指针,不多也不少.它不是数组.它不是2D数组.严格来说,它不能用于指向数组,也不能用于指向2D数组.

Now to get back to the code in the question, which used a different syntax with a pointer-to-pointer. There is nothing mysterious about it. It is a pointer to pointer to type, no more no less. It is not an array. It is not a 2D array. Strictly speaking, it cannot be used to point at an array, nor can it be used to point at a 2D array.

但是,可以使用指针指向指针指向指针数组的第一个元素,而不是指向整个数组.这就是问题中使用它的方式-一种模拟"数组指针的方式.在问题中,它用于指向2个指针的数组.然后使用2个指针中的每一个指向3个整数的数组.

A pointer-to-pointer can however be used to point at the first element of an array of pointers, instead of pointing at the array as whole. And that is how it is used in the question - as a way to "emulate" an array pointer. In the question, it is used to point at an array of 2 pointers. And then each of the 2 pointers is used to point at an array of 3 integers.

这称为查找表,它是一种抽象数据类型(ADT),与普通数组的下层概念有所不同.主要区别在于查找表的分配方式:

This is known as a look-up table, which is a kind of abstract data type (ADT), which is something different from the lower level concept of plain arrays. The main difference is how the look-up table is allocated:

+------------+
|            |
| 0x12340000 |
|            |
+------------+
      |
      |
      v
+------------+     +-------+-------+-------+
|            |     |       |       |       |
| 0x22223333 |---->|   1   |   2   |   3   |
|            |     |       |       |       |
+------------+     +-------+-------+-------+
|            | 
| 0xAAAABBBB |--+
|            |  | 
+------------+  |  
                |
                |  +-------+-------+-------+
                |  |       |       |       |
                +->|   1   |   2   |   3   |
                   |       |       |       |
                   +-------+-------+-------+

此示例中的32位地址是虚构的. 0x12340000框表示指针到指针.它包含指向指针数组中第一项的地址0x12340000.该数组中的每个指针依次包含一个指向整数数组中第一项的地址.

The 32 bit addresses in this example are made-up. The 0x12340000 box represents the pointer-to-pointer. It contains an address 0x12340000 to the first item in an array of pointers. Each pointer in that array in turn, contains an address pointing at the first item in an array of integers.

这是问题开始的地方.

查找表版本问题

查找表分散在整个堆内存中.它不是在相邻单元中连续分配的内存,因为对malloc()的每个调用都给出了一个新的存储区域,不一定与其他区域相邻.反过来,这给我们带来了许多问题:

The look-up table is scattered all over the heap memory. It is not contiguously allocated memory in adjacent cells, because each call to malloc() gives a new memory area, not necessarily located adjacently to the others. This in turn gives us lots of problems:

  • 我们不能按预期使用指针算法.虽然我们可以使用一种指针算术形式来索引和访问查找表中的项目,但不能使用数组指针来做到这一点.

  • We can't use pointer arithmetic as expected. While we can use a form of pointer arithmetic to index and access the items in the look-up table, we can't do so using array pointers.

我们不能使用sizeof运算符.在指针指向上使用时,它将为我们提供指针指向的大小.习惯于指出的第一项,它将为我们提供一个指针的大小.它们都不是数组的大小.

We can't use the sizeof operator. Used on the pointer-to-pointer, it would give us the size of a pointer-to-pointer. Used to the first item pointed at, it would give us the size of a pointer. Neither of them is the size of an array.

除了数组类型(memcpymemsetstrcpybsearchqsort等)之外,我们不能使用标准库函数.所有这些功能都假定获取数组作为输入,并连续分配数据.以我们的查询表作为参数调用它们会导致未定义的行为错误,例如程序崩溃.

We can't use standard library functions that excepts an array type (memcpy, memset, strcpy, bsearch, qsort and so on). All such functions assume to get arrays as input, with data allocated contiguously. Calling them with our look-up table as parameter would result in undefined behavior bugs, such as program crashes.

重复调用malloc以分配多个段会导致堆片段化 ,进而导致对RAM内存的使用不佳.

Repeated calls of malloc to allocate several segments leads to heap fragmentation, which in turn results in poor use of RAM memory.

由于内存分散,因此在遍历查找表时,CPU无法利用高速缓存.有效使用数据缓存需要从上到下迭代的连续内存块.这意味着,根据设计,查找表的访问时间比真正的多维数组要慢得多.

Since the memory is scattered, the CPU cannot utilize cache memory when iterating through the look-up table. Efficient use of the data cache requires a contiguous chunk of memory which is iterated through from top to bottom. This means that the look-up table, by design, has significantly slower access time than a real multi-dimensional array.

对于每个对malloc()的调用,管理堆的库代码都必须计算出可用空间.同样,对于每个free()调用,都必须执行开销代码.因此,出于性能考虑,通常最好是尽可能少地调用这些函数.

For each call to malloc(), the library code managing the heap has to calculate where there is free space. Similarly for each call to free(), there is overhead code which has to be executed. Therefore, as few calls to these functions as possible is often preferable, for the sake of performance.

查找表是否不好?

我们可以看到,基于指针的查找表存在很多问题.但是,它们并不全都不好,它是与其他工具一样的工具.它只是必须用于正确的目的.如果您要寻找一个多维数组(应将其用作数组),则查找表显然是错误的工具.但是它们可以用于其他目的.

As we can see, there are a lot of problems with pointer-based look-up tables. But they aren't all bad, it is a tool like any other. It just has to be used for the right purpose. If you are looking for a multi-dimensional array, which should be used as an array, look-up tables are clearly the wrong tool. But they can be used for other purposes.

当您需要所有尺寸分别具有完全可变的尺寸时,查找表是正确的选择.例如,在创建C字符串列表时,这样的容器可能很方便.然后通常有理由考虑采取上述执行速度性能损失以节省内存.

A look-up table is the right choice when you need all dimensions to have completely variable sizes, individually. Such a container can be handy when for example creating a list of C strings. It is then often justified to take the above mentioned execution speed performance loss in order to save memory.

此外,查找表的优点在于,您可以在运行时重新分配表的各个部分,而无需重新分配整个多维数组.如果需要经常执行此操作,则在执行速度方面,查询表甚至可能优于多维数组.例如,在实现链式哈希表时可以使用类似的查找表.

Also, the look-up table has the advantage that you can re-alloce parts of the table in run-time without the need to re-allocate a whole multi-dimensional array. If this is something that needs to be done frequently, the look-up table might even outperform the multi-dimensional array in terms of execution speed. For example, similar look-up tables can be used when implementing a chained hash table.

然后如何正确地动态分配多维数组?

现代C语言中最简单的形式是简单地使用可变长度数组(VLA). int array[x][y];其中,xy是在运行时在先数组声明中给定值的变量.但是,VLA具有本地作用域,并且不会在程序的整个过程中持续存在-它们具有自动存储的持续时间.因此,尽管VLA可以方便快捷地用于临时阵列,但它并不是问题中查询表的通用替代品.

The easiest form in modern C is to simply use a variable-length array (VLA). int array[x][y]; where x and y are variables given values in run-time, prior array declaration. However, VLAs have local scope and do not persist throughout the duration of the program - they have automatic storage duration. So while VLAs may be convenient and fast to use for temporary arrays, it is not an universal replacement to the look-up table in the question.

要真正真正地动态分配多维数组,以使其获得已分配的存储期限,我们必须使用malloc()/calloc()/realloc().我在下面举一个例子.

To truly allocate a multi-dimensional array dynamically, so that it gets allocated storage duration, we have to use malloc()/calloc()/realloc(). I'll give one example below.

在现代C语言中,您将使用指向VLA的数组指针.即使程序中没有实际的VLA,也可以使用此类指针.与普通的type*void*相比,使用它们的好处是提高了类型安全性.使用指向VLA的指针,还可以使用数组将数组尺寸作为参数传递给函数,从而使其既可变又可以安全键入.

In modern C, you would use array pointers to a VLA. You can use such pointers even when no actual VLA is present in the program. The benefit of using them over a plain type* or a void* is increased type-safety. Using a pointer to a VLA also allows you to pass the array dimensions as parameters to the function using the array, making it both variable and type safe at once.

不幸的是,为了利用拥有指向VLA的指针的好处,我们不能将该指针作为函数结果返回.因此,如果我们需要将指向数组的指针返回给调用方,则必须将其作为参数传递(出于

Unfortunately, in order to use the benefits of having a pointer to VLA, we can't return that pointer as a function result. So if we need to return a pointer to the array to the caller, it must be passed as a parameter (for the reasons described in Dynamic memory access only works inside function). This is fine practice in C, but makes the code a bit hard to read. It would look something like this:

void arr_alloc (size_t x, size_t y, int(**aptr)[x][y])
{
  *aptr = malloc( sizeof(int[x][y]) ); // allocate a true 2D array
  assert(*aptr != NULL);
}

虽然这种带有指向数组指针的指针的语法可能看起来有些奇怪和令人生畏,但即使添加更多维度,它也不会比这更复杂:

While this syntax with a pointer to an array pointer might look a bit strange and intimidating, it doesn't get more complex than this even if we add more dimensions:

void arr_alloc (size_t x, size_t y, size_t z, int(**aptr)[x][y][z])
{
  *aptr = malloc( sizeof(int[x][y][z]) ); // allocate a true 3D array
  assert(*aptr != NULL);
}

现在将代码与为查询表版本添加一个维度的代码进行比较:

Now compare that code with the code for adding one more dimension to the look-up table version:

/* Bad. Don't write code like this! */
int*** arr_alloc (size_t x, size_t y, size_t z)
{
  int*** ppp = malloc(sizeof(*ppp) * x);
  assert(ppp != NULL);
  for(size_t i=0; i<x; i++)
  {
    ppp[i] = malloc(sizeof(**ppp) * y);
    assert(ppp[i] != NULL);
    for(size_t j=0; j<y; j++)
    {
      ppp[i][j] = malloc(sizeof(***ppp) * z);
      assert(ppp[i][j] != NULL);
    }
  }

  return ppp;
}

现在是三星级编程"的一团糟.而且甚至不考虑4个维度...

Now that is one unreadble mess of "three-star programming". And lets not even consider 4 dimensions...

使用真实2D数组的版本的完整代码

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

void arr_alloc (size_t x, size_t y, int(**aptr)[x][y])
{
  *aptr = malloc( sizeof(int[x][y]) ); // allocate a true 2D array
  assert(*aptr != NULL);
}

void arr_fill (size_t x, size_t y, int array[x][y])
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      array[i][j] = (int)j + 1;
    }
  }
}

void arr_print (size_t x, size_t y, int array[x][y])
{
  for(size_t i=0; i<x; i++)
  {
    for(size_t j=0; j<y; j++)
    {
      printf("%d ", array[i][j]);
    }
    printf("\n");
  }
}

int main (void)
{
  size_t x = 2;
  size_t y = 3;
  int (*aptr)[x][y];

  arr_alloc(x, y, &aptr);
  arr_fill(x, y, *aptr);
  arr_print(x, y, *aptr);
  free(aptr); // free the whole 2D array

  return 0;
}

这篇关于正确分配多维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆