连续数组和非连续数组有什么区别? [英] What is the difference between contiguous and non-contiguous arrays?

查看:29
本文介绍了连续数组和非连续数组有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在计算机的内存中,arr 的值是这样存储的:

这意味着arr 是一个C 连续 数组,因为 存储为连续的内存块.下一个内存地址保存该行的下一行值.如果我们想向下移动一列,我们只需要跳过三个块(例如,从 0 跳到 4 意味着我们跳过 1,2 和 3).

arr.T 转置数组意味着 C 连续性丢失,因为相邻的行条目不再位于相邻的内存地址中.但是,arr.TFortran 连续的,因为 位于连续的内存块中:

<小时>

在性能方面,访问彼此相邻的内存地址通常比访问更分散"的地址更快(从 RAM 中获取值可能需要为 CPU 获取和缓存许多相邻地址.) 这意味着对连续数组的操作通常会更快.

作为 C 连续内存布局的结果,行操作通常比列操作快.例如,您通常会发现

np.sum(arr, axis=1) # 对行求和

略快于:

np.sum(arr, axis=0) # 对列求和

同样,对于 Fortran 连续数组,对列的操作会稍微快一些.

<小时>

最后,为什么我们不能通过分配新的形状来展平 Fortran 连续数组?

<预><代码>>>>arr2 = arr.T>>>arr2.shape = 12AttributeError: 非连续数组的形状不兼容

为了使这成为可能,NumPy 必须像这样将 arr.T 的行放在一起:

(设置 shape 属性直接假定 C 顺序 - 即 NumPy 尝试按行执行操作.)

这是不可能的.对于任何轴,NumPy 需要有一个常量 步长(要移动的字节数)才能到达数组的下一个元素.以这种方式展平 arr.T 需要在内存中向前和向后跳过以检索数组的连续值.

如果我们改为编写 arr2.reshape(12),NumPy 会将 arr2 的值复制到一个新的内存块中(因为它无法为此返回原始数据的视图)形状).

In the numpy manual about the reshape() function, it says

>>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modifying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array

My questions are:

  1. What are continuous and noncontiguous arrays? Is it similar to the contiguous memory block in C like What is a contiguous memory block?
  2. Is there any performance difference between these two? When should we use one or the other?
  3. Why does transpose make the array non-contiguous?
  4. Why does c.shape = (20) throws an error incompatible shape for a non-contiguous array?

Thanks for your answer!

解决方案

A contiguous array is just an array stored in an unbroken block of memory: to access the next value in the array, we just move to the next memory address.

Consider the 2D array arr = np.arange(12).reshape(3,4). It looks like this:

In the computer's memory, the values of arr are stored like this:

This means arr is a C contiguous array because the rows are stored as contiguous blocks of memory. The next memory address holds the next row value on that row. If we want to move down a column, we just need to jump over three blocks (e.g. to jump from 0 to 4 means we skip over 1,2 and 3).

Transposing the array with arr.T means that C contiguity is lost because adjacent row entries are no longer in adjacent memory addresses. However, arr.T is Fortran contiguous since the columns are in contiguous blocks of memory:


Performance-wise, accessing memory addresses which are next to each other is very often faster than accessing addresses which are more "spread out" (fetching a value from RAM could entail a number of neighbouring addresses being fetched and cached for the CPU.) This means that operations over contiguous arrays will often be quicker.

As a consequence of C contiguous memory layout, row-wise operations are usually faster than column-wise operations. For example, you'll typically find that

np.sum(arr, axis=1) # sum the rows

is slightly faster than:

np.sum(arr, axis=0) # sum the columns

Similarly, operations on columns will be slightly faster for Fortran contiguous arrays.


Finally, why can't we flatten the Fortran contiguous array by assigning a new shape?

>>> arr2 = arr.T
>>> arr2.shape = 12
AttributeError: incompatible shape for a non-contiguous array

In order for this to be possible NumPy would have to put the rows of arr.T together like this:

(Setting the shape attribute directly assumes C order - i.e. NumPy tries to perform the operation row-wise.)

This is impossible to do. For any axis, NumPy needs to have a constant stride length (the number of bytes to move) to get to the next element of the array. Flattening arr.T in this way would require skipping forwards and backwards in memory to retrieve consecutive values of the array.

If we wrote arr2.reshape(12) instead, NumPy would copy the values of arr2 into a new block of memory (since it can't return a view on to the original data for this shape).

这篇关于连续数组和非连续数组有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆