子集2D numpy数组 [英] Subsetting a 2D numpy array

查看:112
本文介绍了子集2D numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里查看了文档以及其他问题,但是看来我 还没有在numpy数组中进行子设置的麻烦.

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.

我有一个numpy数组, 为了便于讨论,将其定义如下:

I have a numpy array, and for the sake of argument, let it be defined as follows:

import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
#        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
#        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
#        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
#        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
#        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
#        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
#        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

现在我要选择由向量n1n2指定的a的行和列.例如:

now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:

n1 = range(5)
n2 = range(5)

但是当我使用时:

b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])

然后仅选择前五个对角线元素,而不选择整个5x5块.我发现的解决方案是这样做的:

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:

b = a[n1,:]
b = b[:,n2]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [20, 21, 22, 23, 24],
#        [30, 31, 32, 33, 34],
#        [40, 41, 42, 43, 44]])

但是我敢肯定,只有一个命令应该有一种方法可以完成这个简单的任务.

But I am sure there should be a way to do this simple task in just one command.

推荐答案

您已经获得了一些不错的示例,以了解如何做自己想做的事情.但是,了解正在发生的事情以及事物按其工作方式运作的原因也很有用.有一些简单的规则可以在将来为您提供帮助.

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.

花式"索引(即使用列表/序列)和正常"索引(使用切片)之间存在很大差异.根本原因与数组是否可以规则地跨步"有关,因此与是否需要复制有关.因此,如果我们希望能够不复制而创建视图",则必须区别对待任意序列.

There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.

在您的情况下:

import numpy as np

a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)

# Not what you want
b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])

# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]

# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]


使用1D序列进行花式索引基本上等同于将它们压缩在一起并对其结果进行索引.


Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.

print "Fancy Indexing:"
print a[n1, n2]

print "Manual indexing:"
for i, j in zip(n1, n2):
    print a[i, j]


但是,如果您要建立索引的序列与您要建立索引的数组的维数匹配(在本例中为2D),则对索引的处理会有所不同. numpy而不是将两者压缩在一起",而是像使用遮罩一样使用索引.


However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.

换句话说,a[[[1, 2, 3]], [[1],[2],[3]]]a[[1, 2, 3], [1, 2, 3]]的处理方式完全不同,因为您要传递的序列/数组是二维的.

In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])


更精确一点,


To be a bit more precise,

a[[[1, 2, 3]], [[1],[2],[3]]]

的处理方式完全一样:

i = [[1, 1, 1],
     [2, 2, 2],
     [3, 3, 3]])
j = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]
a[i, j]

换句话说,输入是否为行/列向量是索引应如何在索引中重复的简写.

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.

np.meshgridnp.ix_只是将1D序列转换为2D版本以进行索引的简便方法:

np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:

In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
       [2],
       [3]]), array([[1, 2, 3]]))

类似(sparse参数将使其与上面的ix_相同):

Similarly (the sparse argument would make it identical to ix_ above):

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]]),
 array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])]

这篇关于子集2D numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆