在numpy数组中查找多个值的行索引 [英] Find the row indexes of several values in a numpy array

查看:70
本文介绍了在numpy数组中查找多个值的行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组X:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

我希望找到此数组中多个值的行的索引:

And I wish to find the index of the row of several values in this array:

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

对于此示例,我想要一个类似的结果:

For this example I would like a result like:

[0,3,4]

我有执行此操作的代码,但我认为它过于复杂:

I have a code doing this, but I think it is overly complicated:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

result = []

for s in searched_values:
    idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
    result.append(idx)

print(result)

对于类似的问题,我找到了此答案,但它仅适用于一维数组.

I found this answer for a similar question but it works only for 1d arrays.

有没有一种方法可以更简单地完成我想做的事情?

Is there a way to do what I want in a simpler way?

推荐答案

方法1

一种方法是使用 NumPy broadcasting ,像这样-

One approach would be to use NumPy broadcasting, like so -

np.where((X==searched_values[:,None]).all(-1))[1]

方法2

一种内存有效的方法是将每一行转换为线性索引等效项,然后使用 np.in1d ,就像这样-

A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -

dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
                    np.ravel_multi_index(searched_values.T,dims)))[0]

方法3

使用 np.searchsorted 的另一种内存有效方法并具有转换为线性索引等效项的相同原理,就像这样-

Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -

dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]

请注意,此np.searchsorted方法假定Xsearched_values的每一行都有一个匹配项.

Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.

此函数为我们提供线性索引的等效数字.它接受 n-dimensional indices2D数组. ,设置为列和要在其上映射这些索引并计算等效线性索引的n维网格本身的形状.

This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.

让我们使用我们所面临的问题的投入.以输入X的情况为例,并注意它的第一行.由于我们试图将X的每一行转换为其等效的线性索引,并且由于np.ravel_multi_index假定每一列为一个索引元组,因此我们需要先对X进行转置,然后再馈入该函数.由于在这种情况下X中每行的元素数为2,因此要映射到的n维网格将为2D.在X中每行3个元素,它应该是3D网格用于映射,依此类推.

Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.

要查看此函数如何计算线性索引,请考虑X-

To see how this function would compute linear indices, consider the first row of X -

In [77]: X
Out[77]: 
array([[4, 2],
       [9, 3],
       [8, 5],
       [3, 3],
       [5, 6]])

n维网格的形状为dims-

In [78]: dims
Out[78]: array([10,  7])

让我们创建二维网格以查看该映射如何工作以及如何使用np.ravel_multi_index-

Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -

In [79]: out = np.zeros(dims,dtype=int)

In [80]: out
Out[80]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

让我们设置X中的第一个索引元组,即X中的第一行进入网格-

Let's set the first indexing tuple from X, i.e. the first row from X into the grid -

In [81]: out[4,2] = 1

In [82]: out
Out[82]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

现在,要查看刚设置的元素的线性索引等效项,让我们展平并使用np.where来检测该1.

Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.

In [83]: np.where(out.ravel())[0]
Out[83]: array([30])

如果考虑行优先排序,也可以计算得出.

This could also be computed if row-major ordering is taken into account.

让我们使用np.ravel_multi_index并验证那些线性索引-

Let's use np.ravel_multi_index and verify those linear indices -

In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])

因此,我们将具有对应于X中每个索引元组的线性索引,即X中的每一行.

Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.

选择np.ravel_multi_index的尺寸以形成唯一的线性索引

Choosing dimensions for np.ravel_multi_index to form unique linear indices

现在,将X的每一行视为n维网格的索引元组并将每个这样的元组转换为标量的背后的想法是,要具有对应于唯一元组的唯一标量,即X中的唯一行.

Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.

让我们再来看一下X-

In [77]: X
Out[77]: 
array([[4, 2],
       [9, 3],
       [8, 5],
       [3, 3],
       [5, 6]])

现在,如上一节所述,我们将每行视为索引元组.在每个这样的索引元组中,第一个元素将代表n-dim网格的第一个轴,第二个元素将是网格的第二个轴,依此类推,直到X中每行的最后一个元素.本质上,每一列将代表网格的一个维度或轴.如果要将X中的所有元素映射到同一n-dim网格,则需要考虑这样建议的n-dim网格的每个轴的最大拉伸.假设我们正在处理X中的正数,那么这样的延展将是X + 1中每一列的最大值.之所以说+ 1是因为Python遵循0-based索引.因此,例如 X[1,0] == 9将映射到所建议网格的第十行.同样, X[4,1] == 6将转到该网格的7th.

Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.

因此,对于我们的示例案例,我们有-

So, for our sample case, we had -

In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1

In [8]: dims
Out[8]: array([10,  7])

因此,对于我们的示例案例,我们将需要至少为(10,7)形状的网格.沿尺寸的更多长度不会受到损害,并且也会为我们提供独特的线性指标.

Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.

结束语:这里要注意的一件事是,如果X中有负数,则需要在X中的每一列上添加适当的偏移量,以使那些索引元组成为正数,然后再使用.

Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.

这篇关于在numpy数组中查找多个值的行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆