在 numpy 数组中查找多个值的行索引 [英] Find the row indexes of several values in a numpy array

查看:34
本文介绍了在 numpy 数组中查找多个值的行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组 X:

X = np.array([[4, 2],[9, 3],[8, 5],[3, 3],[5, 6]])

我希望在这个数组中找到几个值的行的索引:

searched_values = np.array([[4, 2],[3, 3],[5, 6]])

对于这个例子,我想要这样的结果:

[0,3,4]

我有一个这样做的代码,但我认为它过于复杂:

X = np.array([[4, 2],[9, 3],[8, 5],[3, 3],[5, 6]])searched_values = np.array([[4, 2],[3, 3],[5, 6]])结果 = []对于 searched_values 中的 s:idx = np.argwhere([np.all((X-s)==0,axis=1)])[0][1]result.append(idx)打印(结果)

我发现这个答案是针对类似问题的,但它仅适用于一维数组.

有没有办法以更简单的方式做我想做的事?

解决方案

方法 #1

一种方法是使用 NumPy 广播,像这样 -

np.where((X==searched_values[:,None]).all(-1))[1]

方法#2

内存高效的方法是将每一行转换为线性索引等价物,然后使用 np.in1d,就像这样 -

dims = X.max(0)+1out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\np.ravel_multi_index(searched_values.T,dims)))[0]

方法#3

另一种使用 np.searchsorted 并且具有相同的转换为线性索引等效项的哲学就像这样 -

dims = X.max(0)+1X1D = np.ravel_multi_index(X.T,dims)searched_valuesID = np.ravel_multi_index(searched_values.T,dims)sidx = X1D.argsort()out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]

请注意,这个 np.searchsorted 方法假设 Xsearched_values 的每一行都有一个匹配项.


如何np.ravel_multi_index工作?

这个函数为我们提供了线性索引等价数.它接受 n 维索引,设置为列和 n 维网格本身的形状,这些索引将被映射到这些网格上并计算等效的线性索引.>

让我们使用现有的输入来解决手头的问题.以输入 X 为例,注意它的第一行.因为,我们试图将 X 的每一行转换为其等效的线性索引,并且由于 np.ravel_multi_index 将每一列假定为一个索引元组,我们需要转置 X 在输入函数之前.由于在这种情况下 X 中每行的元素数是 2,因此要映射到的 n 维网格将是 2D.在 X 中每行 3 个元素,它本来是用于映射等的 3D 网格.

要了解此函数如何计算线性索引,请考虑 X -

的第一行

在[77]中:X出[77]:数组([[4, 2],[9, 3],[8, 5],[3, 3],[5, 6]])

我们将 n 维网格的形状设为 dims -

在 [78] 中:暗淡出[78]:数组([10, 7])

让我们创建二维网格,看看映射是如何工作的,以及如何使用 np.ravel_multi_index -

计算线性索引

In [79]: out = np.zeros(dims,dtype=int)输入 [80]:输出出[80]:数组([[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0]])

让我们设置 X 的第一个索引元组,即 X 的第一行进入网格 -

输入[81]:输出[4,2] = 1输入 [82]:输出出[82]:数组([[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 1, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0]])

现在,要查看与刚刚设置的元素等效的线性索引,让我们展平并使用 np.where 来检测 1.

在 [83]: np.where(out.ravel())[0]出[83]:数组([30])

如果考虑行优先顺序,这也可以计算出来.

让我们使用 np.ravel_multi_index 并验证那些线性索引 -

在 [84]: np.ravel_multi_index(X.T,dims)出[84]:数组([30, 66, 61, 24, 41])

因此,我们将有对应于来自 X 的每个索引元组的线性索引,即来自 X 的每一行.

np.ravel_multi_index选择维度以形成唯一的线性索引

现在,将 X 的每一行视为 n 维网格的索引元组并将每个这样的元组转换为标量背后的想法是具有与唯一元组对应的唯一标量,即唯一行在 X 中.

再来看X -

在[77]中:X出[77]:数组([[4, 2],[9, 3],[8, 5],[3, 3],[5, 6]])

现在,如上一节所述,我们将每一行视为索引元组.在每个这样的索引元组中,第一个元素将代表 n-dim 网格的第一个轴,第二个元素将是网格的第二个轴,依此类推,直到 X 中每行的最后一个元素.本质上,每一列将代表网格的一个维度或轴.如果我们要将 X 中的所有元素映射到同一个 n-dim 网格,我们需要考虑这样一个提议的 n-dim 网格的每个轴的最大拉伸.假设我们在 X 中处理正数,这样的拉伸将是 X 中每一列的最大值 + 1.那个 + 1 是因为 Python 遵循 0-based 索引.因此,例如 X[1,0] == 9 将映射到建议网格的第 10 行.类似地,X[4,1] == 6 将转到该网格的7th.

因此,对于我们的示例案例,我们有 -

In [7]: dims = X.max(axis=0) + 1 # 或者干脆 X.max(0) + 1在 [8]:暗淡出[8]:数组([10, 7])

因此,对于我们的示例案例,我们至少需要一个形状为 (10,7) 的网格.维度上的更多长度不会受到影响,也会为我们提供独特的线性索引.

结束语:这里需要注意的一件重要事情是,如果我们在X中有负数,我们需要沿着X中的每一列添加适当的偏移量以使在使用 np.ravel_multi_index 之前将那些索引元组作为正数.

I have an array X:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

And I wish to find the index of the row of several values in this array:

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

For this example I would like a result like:

[0,3,4]

I have a code doing this, but I think it is overly complicated:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

result = []

for s in searched_values:
    idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
    result.append(idx)

print(result)

I found this answer for a similar question but it works only for 1d arrays.

Is there a way to do what I want in a simpler way?

解决方案

Approach #1

One approach would be to use NumPy broadcasting, like so -

np.where((X==searched_values[:,None]).all(-1))[1]

Approach #2

A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -

dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
                    np.ravel_multi_index(searched_values.T,dims)))[0]

Approach #3

Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -

dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]

Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.


How does np.ravel_multi_index work?

This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.

Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.

To see how this function would compute linear indices, consider the first row of X -

In [77]: X
Out[77]: 
array([[4, 2],
       [9, 3],
       [8, 5],
       [3, 3],
       [5, 6]])

We have the shape of the n-dimensional grid as dims -

In [78]: dims
Out[78]: array([10,  7])

Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -

In [79]: out = np.zeros(dims,dtype=int)

In [80]: out
Out[80]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

Let's set the first indexing tuple from X, i.e. the first row from X into the grid -

In [81]: out[4,2] = 1

In [82]: out
Out[82]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.

In [83]: np.where(out.ravel())[0]
Out[83]: array([30])

This could also be computed if row-major ordering is taken into account.

Let's use np.ravel_multi_index and verify those linear indices -

In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])

Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.

Choosing dimensions for np.ravel_multi_index to form unique linear indices

Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.

Let's take another look at X -

In [77]: X
Out[77]: 
array([[4, 2],
       [9, 3],
       [8, 5],
       [3, 3],
       [5, 6]])

Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.

So, for our sample case, we had -

In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1

In [8]: dims
Out[8]: array([10,  7])

Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.

Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.

这篇关于在 numpy 数组中查找多个值的行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆