矢量化:数组的索引太多 [英] vectorization : too many indices for array

查看：32 发布时间：2021/9/17 19:18:55 python numpy vectorization

本文介绍了矢量化:数组的索引太多的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

a=b=np.arange(9).reshape(3,3)i=np.arange(3)掩码=a

和

b[np.where(mask[0])]>>>数组([0, 1, 2])b[np.where(mask[1])]>>>数组([0, 1, 2, 3])b[np.where(mask[2])]>>>数组([0, 1, 2, 3, 4])

现在我想对它进行矢量化并将它们全部打印出来，然后我尝试

b[np.where(mask[i])] 和 b[np.where(mask[i[:,None,None]])]

两者都显示IndexError:数组索引太多

解决方案

In [165]: a出[165]:数组([[0, 1, 2],[3, 4, 5],[6, 7, 8]])在 [166] 中:掩码出[166]:数组([[[真，真，真]，[假，假，假]，[假，假，假]]，[[真，真，真]，[对，错，错]，[假，假，假]]，[[真，真，真]，[对，对，错]，[假，假，假]]]，dtype=bool)

所以a(和b)是(3,3)，而mask是(3,3,3).

应用于数组的布尔掩码产生 1d(通过 where 应用时相同):

在[170]中:a[mask[1,:,:]]出[170]:数组([0, 1, 2, 3])

二维掩码上的where产生一个2元素元组，可以索引二维数组:

在[173]: np.where(mask[1,:,:])出[173]: (数组([0, 0, 0, 1], dtype=int32), 数组([0, 1, 2, 0], dtype=int32))

where 在 3d 蒙版上是一个 3 元素元组 - 因此 too many indices 错误:

在[174]: np.where(mask)出[174]:(数组([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32),数组([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1], dtype=int32),数组([0, 1, 2, 0, 1, 2, 0, 0, 1, 2, 0, 1], dtype=int32))

让我们尝试将 a 扩展到 3d 并应用蒙版

在 [176]: np.tile(a[None,:],(3,1,1)).shape输出[176]:(3, 3, 3)在 [177]: np.tile(a[None,:],(3,1,1))[mask]Out[177]: 数组([0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4])

值在那里，但它们是连接在一起的.

我们可以计算mask的每个平面中True的数量，并用它来split被屏蔽的tile:

In [185]: mask.sum(axis=(1,2))输出[185]:数组([3, 4, 5])在 [186] 中:cnt=np.cumsum(mask.sum(axis=(1,2)))在 [187] 中:cntOut[187]: 数组([ 3, 7, 12], dtype=int32)在 [189]: np.split(np.tile(a[None,:],(3,1,1))[mask], cnt[:-1])Out[189]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]

在内部 np.split 使用 Python 级别的迭代.所以 mask 平面上的迭代可能同样好(在这个小例子中快 6 倍).

In [190]: [a[m] for m in mask]Out[190]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]

<小时>

这指出了所需矢量化"的一个基本问题，单个数组是 (3,)、(4,) 和 (5,) 形状.不同大小的数组是一个强有力的指标，表明真正的矢量化"即使不是不可能也是很困难的.

a=b=np.arange(9).reshape(3,3)
i=np.arange(3)
mask=a<i[:,None,None]+3

and

b[np.where(mask[0])]
>>>array([0, 1, 2])

b[np.where(mask[1])]
>>>array([0, 1, 2, 3])

b[np.where(mask[2])]
>>>array([0, 1, 2, 3, 4])

Now I wanna vectorize it and print them all together, and I try



b[np.where(mask[i])] and b[np.where(mask[i[:,None,None]])]

Both of them show IndexError: too many indices for array
 解决方案 
In [165]: a
Out[165]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [166]: mask
Out[166]: 
array([[[ True,  True,  True],
        [False, False, False],
        [False, False, False]],

       [[ True,  True,  True],
        [ True, False, False],
        [False, False, False]],

       [[ True,  True,  True],
        [ True,  True, False],
        [False, False, False]]], dtype=bool)
So a (and b) is (3,3), while mask is (3,3,3).

A boolean mask, applied to an array produces a 1d (same when applied via where):
In [170]: a[mask[1,:,:]]
Out[170]: array([0, 1, 2, 3])
The where on the 2d mask produces a 2 element tuple, which can index the 2d array:
In [173]: np.where(mask[1,:,:])
Out[173]: (array([0, 0, 0, 1], dtype=int32), array([0, 1, 2, 0], dtype=int32))
where on the 3d mask is a 3 element tuple - hence the too many indices error:
In [174]: np.where(mask)
Out[174]: 
(array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32),
 array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1], dtype=int32),
 array([0, 1, 2, 0, 1, 2, 0, 0, 1, 2, 0, 1], dtype=int32))
Let's try expanding a to 3d and apply the mask
In [176]: np.tile(a[None,:],(3,1,1)).shape
Out[176]: (3, 3, 3)
In [177]: np.tile(a[None,:],(3,1,1))[mask]
Out[177]: array([0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4])
The values are there, but they are joined. 

We can count the number of True in each plane of mask, and use that to split the masked tile:
In [185]: mask.sum(axis=(1,2))
Out[185]: array([3, 4, 5])
In [186]: cnt=np.cumsum(mask.sum(axis=(1,2)))
In [187]: cnt
Out[187]: array([ 3,  7, 12], dtype=int32)

In [189]: np.split(np.tile(a[None,:],(3,1,1))[mask], cnt[:-1])
Out[189]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]
Internally np.split uses a Python level iteration.  So iteration on the mask planes might be just as good (6x faster on this small example).
In [190]: [a[m] for m in mask]
Out[190]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]




That points to a fundamental problem with the desired 'vectorization', the individual arrays are (3,), (4,) and (5,) shape.  Differing size arrays is a strong indicator that true 'vectorization' is difficult if not impossible.  

                        这篇关于矢量化:数组的索引太多的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

矢量化:数组的索引太多 [英] vectorization : too many indices for array

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

矢量化:数组的索引太多 [英] vectorization : too many indices for array

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭