NumPy索引:使用布尔数组进行广播 [英] NumPy indexing: broadcasting with Boolean arrays

查看:276
本文介绍了NumPy索引:使用布尔数组进行广播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题相关,我遇到了通过布尔数组和广播我不明白的索引行为。我们知道可以使用整数索引和广播在2维中索引NumPy数组。这在 docs中指定

Related to this question, I came across an indexing behaviour via Boolean arrays and broadcasting I do not understand. We know it's possible to index a NumPy array in 2 dimensions using integer indices and broadcasting. This is specified in the docs:

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

c1 = np.where(b1)[0]  # i.e. [1, 2]
c2 = np.where(b2)[0]  # i.e. [0, 2]

a[c1[:, np.newaxis], c2]  # or a[c1[:, None], c2]

array([[ 4,  6],
       [ 8, 10]])

然而,这对布尔数组也不起作用。

However, the same does not work for Boolean arrays.

a[b1[:, None], b2]

IndexError: too many indices for array

替代 numpy.ix _ 适用于整数布尔a rrays。这似乎是因为 ix _ 对布尔数组执行特定操作以确保一致的处理。

The alternative numpy.ix_ works for both integer and Boolean arrays. This seems to be because ix_ performs specific manipulation for Boolean arrays to ensure consistent treatment.

assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])

array([[ 4,  6],
       [ 8, 10]])

所以我的问题是:为什么广播与整数,但没有布尔数组?这种行为是否有记录?或者我误解了一个更基本的问题?

So my question is: why does broadcasting work with integers, but not with Boolean arrays? Is this behaviour documented? Or am I misunderstanding a more fundamental issue?

推荐答案

作为 @Divakar 在评论中注明,布尔值高级索引的行为就好像它们首先通过 np.nonzero 一样传送,然后一起广播,请参阅相关文档以获得详尽的解释。引用文档,

As @Divakar noted in comments, Boolean advanced indices behave as if they were first fed through np.nonzero and then broadcast together, see the relevant documentation for extensive explanations. To quote the docs,


通常,如果索引包含布尔数组,则结果与插入相同obj.nonzero()进入相同位置并使用上述整数数组索引机制。 x [ind_1,boolean_array,ind_2] 相当于 x [(ind_1,)+ boolean_array.nonzero()+(ind_2,)]

[...]

使用可以最好地理解多个布尔索引数组或布尔与整数索引数组合obj.nonzero()类比。函数 ix _ 也支持布尔数组,并且可以毫无意外地工作。

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

在你的情况下,广播不一定是个问题,因为两个数组只有两个非零元素。问题是结果中的维度数量:

In your case broadcasting would not necessarily be a problem, since both arrays have only two nonzero elements. The problem is the number of dimensions in the result:

>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1

因此索引表达式 a [b1 [:,None],b2] 等同于 a [b1 [:,None] .nonzero()+ b2.nonzero()] ,这将在 a 中放置一个长度为3的元组,对应于一个3d数组索引。因此,你看到太多指数的错误。

Consequently the indexing expression a[b1[:,None], b2] would be equivalent to a[b1[:,None].nonzero() + b2.nonzero()], which would put a length-3 tuple inside a, corresponding to a 3d array index. Hence the error you see about "too many indices".

文档中提到的惊喜非常接近你的例子:如果你没有注入那个单例维度怎么办? ?从长度为3和长度为4的布尔数组开始,您将得到一个长度为2的高级索引,即大小为的1d数组(2,) 。这绝不是你想要的,这引导我们讨论这个主题中的另一个琐事。

The surprises mentioned in the docs are very close to your example: what if you hadn't injected that singleton dimension? Starting from a length-3 and a length-4 Boolean array you would've ended up with a length-2 advanced index, i.e. a 1d array of size (2,). This is never what you'd want, which is leads us to another piece of trivia in the subject.

在计划改进高级索引时,已经有很多讨论,参见工作进展草案 NEP 21 。问题的关键在于,花哨的索引在numpy中,虽然有明确的记录,但有一些非常奇特的功能,对任何事情都没有实际用处,但如果你通过产生令人惊讶的结果而不是错误而犯错误,那么它会咬你。

There's been a lot of discussion in planning to revamp advanced indexing, see the work-in-progress draft NEP 21. The gist of the issue is that fancy indexing in numpy, while clearly documented, has some very quirky features which aren't practically useful for anything, but which can bite you if you make a mistake by producing surprising results rather than errors.

NEP的相关报价:


涉及多个数组索引的混合案例也是令人惊讶的是,
只是问题较少,因为目前的行为是如此无用,以至于在实践中很少遇到
。当布尔数组索引是
与另一个布尔或整数数组混合时,布尔数组是
转换为整数数组索引(相当于 np.nonzero())和
然后广播。例如,索引大小(2,2)的2D数组,例如
x [[True,False],[True,False ]] 生成形状为(1,)的1D向量,
不是形状为(1,1)。

Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to np.nonzero()) and then broadcast. For example, indexing a 2D array of size (2, 2) like x[[True, False], [True, False]] produces a 1D vector with shape (1,), not a 2D sub-matrix with shape (1, 1).

现在,我强调NEP是非常有用的-progress,但NEP当前状态中的一个建议是禁止在上面的高级索引案例中使用布尔数组,并且只允许它们在外部索引场景中,即恰好是什么 np .ix _ 可以帮助你处理你的布尔数组:

Now, I emphasize that the NEP is very much work-in-progress, but one of the suggestions in the current state of the NEP is to forbid Boolean arrays in advanced indexing cases such as the above, and only allow them in "outer indexing" scenarios, i.e. exactly what np.ix_ would help you do with your Boolean array:


布尔索引是概念上的外部索引。以遗留索引的方式与其他先进指数一起广播[即当前行为]通常没有帮助或定义明确。因此,希望非零加广播行为的用户可以手动执行此操作。

Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.

我的观点是布尔先进的指数及其弃用状态(或缺乏状态)可能会在不远的将来发生变化。

My point is that the behaviour of Boolean advanced indices and their deprecation status (or lack thereof) may change in the not-so-distant future.

这篇关于NumPy索引:使用布尔数组进行广播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆