子集 2D numpy 数组 [英] Subsetting a 2D numpy array

查看:57
本文介绍了子集 2D numpy 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里查看了文档和其他问题,但似乎我还没有掌握 numpy 数组子集的窍门.

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.

我有一个 numpy 数组,为便于论证,定义如下:

I have a numpy array, and for the sake of argument, let it be defined as follows:

import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
#        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
#        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
#        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
#        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
#        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
#        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
#        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

现在我想选择由向量n1n2指定的a的行和列.举个例子:

now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:

n1 = range(5)
n2 = range(5)

但是当我使用:

b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])

然后只选择第一个第五对角线元素,而不是整个 5x5 块.我找到的解决方案是这样做:

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:

b = a[n1,:]
b = b[:,n2]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [20, 21, 22, 23, 24],
#        [30, 31, 32, 33, 34],
#        [40, 41, 42, 43, 44]])

但我确信应该有一种方法可以在一个命令中完成这个简单的任务.

But I am sure there should be a way to do this simple task in just one command.

推荐答案

您已经获得了一些关于如何做您想做的事的好例子.然而,了解正在发生的事情以及事情为什么会这样运作也很有用.有一些简单的规则可以在未来对您有所帮助.

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.

花式"索引(即使用列表/序列)和普通"索引(使用切片)之间存在很大差异.根本原因与数组是否可以定期跨步"有关,因此是否需要制作副本.因此,如果我们希望能够在不复制的情况下创建视图",就必须区别对待任意序列.

There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.

就你而言:

import numpy as np

a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)

# Not what you want
b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])

# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]

# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]

<小时>

使用一维序列进行花式索引基本上等同于将它们压缩在一起并使用结果进行索引.


Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.

print "Fancy Indexing:"
print a[n1, n2]

print "Manual indexing:"
for i, j in zip(n1, n2):
    print a[i, j]

<小时>

但是,如果您索引的序列与您索引的数组的维度(在本例中为 2D)相匹配,则索引的处理方式不同.numpy 不是将两者压缩在一起",而是像掩码一样使用索引.


However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.

换句话说,a[[[1, 2, 3]], [[1],[2],[3]]]a[ 的处理方式完全不同[1, 2, 3], [1, 2, 3]] ,因为你传入的序列/数组是二维的.

In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])

<小时>

更准确地说,


To be a bit more precise,

a[[[1, 2, 3]], [[1],[2],[3]]]

完全按照以下方式处理:

is treated exactly like:

i = [[1, 1, 1],
     [2, 2, 2],
     [3, 3, 3]])
j = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]
a[i, j]

换句话说,输入是否是行/列向量是索引应如何在索引中重复的简写.

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.

np.meshgridnp.ix_ 只是将您的 1D 序列转换为用于索引的 2D 版本的便捷方法:

np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:

In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
       [2],
       [3]]), array([[1, 2, 3]]))

类似地(sparse 参数将使它与上面的 ix_ 相同):

Similarly (the sparse argument would make it identical to ix_ above):

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]]),
 array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])]

这篇关于子集 2D numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆