根据另一个numpy数组中的值查找numpy数组的索引 [英] Find indices of numpy array based on values in another numpy array

查看:167
本文介绍了根据另一个numpy数组中的值查找numpy数组的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果索引与另一个较小数组的值匹配,我想在较大数组中查找索引.类似于 new_array 如下:

I want to find the indices in a larger array if they match the values of a different, smaller array. Something like new_array below:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
new_array = np.where(summed_rows == common_sums)

但是,这将返回:

__main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. 
>>>new_array 
(array([], dtype=int64),)

我最近得到的是:

new_array = [np.array(np.where(summed_rows==important_sum)) for important_sum in common_sums[0]]

这给了我一个包含三个numpy数组的列表(每个重要和"一个),但是每个数组的长度不同,这会导致进一步的下游级联和vstacking问题.需要明确的是,我要使用上面的行.我想使用numpy索引到 summed_rows .我已经查看了使用 numpy.where numpy.argwhere numpy.intersect1d 的各种答案,但是很难将这些想法整合在一起.我发现我想念一些简单的东西,问起来会更快.

This gives me a list with three numpy arrays (one for each 'important sum'), but each is a different length which produces further downstream problems with concatenation and vstacking. To be clear, I do not want to use the line above. I want to use numpy to index into summed_rows. I've looked at various answers using numpy.where, numpy.argwhere, and numpy.intersect1d, but am having trouble putting the ideas together. I figured I'm missing something simple and it would be faster to ask.

提前感谢您的推荐!

推荐答案

考虑到注释中建议的选项,并在numpy的in1d选项中添加一个额外的选项:

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0]   # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1]   # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0]   # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

如果计时的话,您会发现它们都很相似,但是@Brenlla的选择是最快的选择

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop

这篇关于根据另一个numpy数组中的值查找numpy数组的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆