Numpy多维数组索引交换轴顺序 [英] Numpy multi-dimensional array indexing swaps axis order

查看:978
本文介绍了Numpy多维数组索引交换轴顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理多维Numpy数组.在使用其他索引数组访问这些数组时,我注意到一些不一致的行为.例如:

I am working with multi-dimensional Numpy arrays. I have noticed some inconsistent behavior when accessing these arrays with other index arrays. For example:

import numpy as np
start = np.zeros((7,5,3))
a     = start[:,:,np.arange(2)]
b     = start[0,:,np.arange(2)]
c     = start[0,:,:2]
print 'a:', a.shape
print 'b:', b.shape
print 'c:', c.shape

在此示例中,我得到结果:

In this example, I get the result:

a: (7, 5, 2)
b: (2, 5)
c: (5, 2)

这使我感到困惑.为什么"b"和"c"的尺寸不同?为什么"b"交换轴顺序,而不交换"a"?

This confuses me. Why do "b" and "c" not have the same dimensions? Why does "b" swap the axis order, but not "a"?

由于进行了许多单元测试,因此我能够围绕这些不一致之处设计代码,但了解发生了什么将不胜感激.

I have been able to design my code around these inconsistencies thanks to lots of unit tests, but understanding what is going on would be appreciated.

作为参考,我正在通过MacPorts使用Python 2.7.3和Numpy 1.6.2.

For reference, I am using Python 2.7.3, and Numpy 1.6.2 via MacPorts.

推荐答案

从语法上看,这看起来像是一个不一致的地方,但是从语义上来说,您在这里做的是两个非常不同的事情.在ab的定义中,您正在执行基本切片,它返回数据视图.

Syntactically, this looks like an inconsistency, but semantically, you're doing two very different things here. In your definition of a and b, you're doing advanced indexing, sometimes called fancy indexing, which returns a copy of the data. In your definition of c, you're doing basic slicing, which returns a view of the data.

要说明两者之间的区别,它有助于理解如何将索引传递给python对象.以下是一些示例:

To tell the difference, it helps to understand how indices are passed to python objects. Here are some examples:

>>> class ShowIndex(object):
...     def __getitem__(self, index):
...         print index
... 
>>> ShowIndex()[:,:]
(slice(None, None, None), slice(None, None, None))
>>> ShowIndex()[...,:]
(Ellipsis, slice(None, None, None))
>>> ShowIndex()[0:5:2,::-1]
(slice(0, 5, 2), slice(None, None, -1))
>>> ShowIndex()[0:5:2,np.arange(3)]
(slice(0, 5, 2), array([0, 1, 2]))
>>> ShowIndex()[0:5:2]
slice(0, 5, 2)
>>> ShowIndex()[5, 5]
(5, 5)
>>> ShowIndex()[5]
5
>>> ShowIndex()[np.arange(3)]
[0 1 2]

如您所见,有许多种不同的可能配置.首先,可以传递单个项,或者可以传递项的元组.其次,元组可以包含slice对象,Ellipsis对象,纯整数或numpy数组.

As you can see, there are many different possible configurations. First, individual items may be passed, or tuples of items may be passed. Second, the tuples may contain slice objects, Ellipsis objects, plain integers, or numpy arrays.

基本切片.这些可以单独或通过元组传递.关于基本切片的激活方式,这是文档要说的:

Basic slicing is activated when you pass only objects like int, slice, or Ellipsis objects, or None (which is the same as numpy.newaxis). These can be passed singly or in a tuple. Here's what the docs have to say about how basic slicing is activated:

当obj是一个切片对象(由方括号内的start:stop:step表示法构造),一个整数或一个切片对象和整数元组时,将发生基本切片.省略号和newaxis对象也可以散布在这些对象上.为了保持与Numeric中的常用用法向后兼容,如果选择对象是包含切片对象,Ellipsis对象或newaxis对象但不包含整数数组或其他对象的任何序列(例如列表),则也将启动基本切片嵌入序列.

Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets), an integer, or a tuple of slice objects and integers. Ellipsis and newaxis objects can be interspersed with these as well. In order to remain backward compatible with a common usage in Numeric, basic slicing is also initiated if the selection object is any sequence (such as a list) containing slice objects, the Ellipsis object, or the newaxis object, but no integer arrays or other embedded sequences.

当您传递numpy数组,仅包含整数或包含任何类型的子序列的非元组序列或包含数组或子序列的元组时,将激活高级索引.

Advanced indexing is activated when you pass a numpy array, a non-tuple sequence containing only integers or containing subsequences of any kind, or a tuple containing an array or subsequence.

有关高级索引编制和基本切片的区别的详细信息,请参阅文档(链接至上文).但是在这种情况下,我很清楚发生了什么.使用部分索引时,它与以下行为有关:

For details on how advanced indexing and basic slicing differ, see the docs (linked to above). But in this particular case, it's clear to me what's happening. It has to do with the following behavior when using partial indexing:

部分索引的规则是结果的形状(或设置中使用的对象的解释形状)是x的形状,其中索引子空间已替换为广播的索引子空间.如果索引子空间彼此紧邻,则广播的索引空间将直接替换x中的所有索引子空间.如果索引子空间是按切片对象分隔的,则广播的索引空间是第一个,然后是x的切片子空间.

The rule for partial indexing is that the shape of the result (or the interpreted shape of the object to be used in setting) is the shape of x with the indexed subspace replaced with the broadcasted indexing subspace. If the index subspaces are right next to each other, then the broadcasted indexing space directly replaces all of the indexed subspaces in x. If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.

在使用高级索引的a定义中,您有效地将序列[0, 1]作为元组的第三项传递,并且由于没有广播发生(因为没有其他序列),所以一切都会发生预期的.

In your definition of a, which uses advanced indexing, you effectively pass the sequence [0, 1] in as the third item of the tuple, and since no broadcasting happens (because there is no other sequence), everything happens as expected.

b的定义中,也使用高级索引,您有效地传递了两个序列,[0],第一项(转换为intp数组)和[0, 1],第三项.这两个项目一起广播,结果具有与第三个项目相同的形状.但是,由于发生了广播,因此我们面临一个问题:在新形状元组中,我们应该在哪里插入广播的形状?正如文档所说,

In your definition of b, also using advanced indexing, you effectively pass two sequences, [0], the first item (which is converted into an intp array), and [0, 1], the third item. These two items are broadcast together, and the result has the same shape as the third item. However, since broadcasting has happened, we're faced with a problem: where in the new shape tuple do we insert the broadcasted shape? As the docs say,

在索引子空间中没有明确的放置位置,因此将其固定在开始位置.

there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning.

因此,广播产生的2被移到形状元组的开头,从而产生明显的换位.

So the 2 that results from broadcasting is moved to the beginning of the shape tuple, producing an apparent transposition.

这篇关于Numpy多维数组索引交换轴顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆