numpy数组:有效地找到匹配的索引 [英] Numpy Array: Efficiently find matching indices

查看:193
本文介绍了numpy数组:有效地找到匹配的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个列表,其中一个列表非常庞大(数以百万计的元素),另外几千个.我想做以下

I have two lists, one of which is massive (millions of elements), the other several thousand. I want to do the following

bigArray=[0,1,0,2,3,2,,.....]

smallArray=[0,1,2,3,4]

for i in len(smallArray):
  pts=np.where(bigArray==smallArray[i])
  #Do stuff with pts...

以上方法有效,但速度较慢.有什么方法可以更有效地做到这一点,而无需诉诸C语言?

The above works, but is slow. Is there any way to do this more efficiently without resorting to writing something in C?

推荐答案

在您的情况下,您可能会受益于对大型数组进行预排序.这是演示如何将时间从〜45秒减少到2秒(在我的笔记本电脑上)的示例(对于阵列5e6与1e3的一组特定长度).显然,如果阵列大小完全不同,则解决方案将不是最佳解决方案.例如.使用默认解决方案时,复杂度为O(bigN * smallN),但对于我建议的解决方案,复杂度为O((bigN + smallN)* log(bigN))

In your case you may benefit from presorting your big array. Here is the example demonstrating how you can reduce the time from ~ 45 seconds to 2 seconds (on my laptop)(for one particular set of lengths of the arrays 5e6 vs 1e3). Obviously the solution won't be optimal if the array sizes will be wastly different. E.g. with the default solution the complexity is O(bigN*smallN), but for my suggested solution it is O((bigN+smallN)*log(bigN))

import numpy as np, numpy.random as nprand, time, bisect

bigN = 5e6
smallN = 1000
maxn = 1e7
nprand.seed(1)  
bigArr = nprand.randint(0, maxn, size=bigN)
smallArr = nprand.randint(0, maxn, size=smallN)

# brute force 
t1 = time.time()
for i in range(len(smallArr)):
    inds = np.where(bigArr == smallArr[i])[0]
t2 = time.time()
print "Brute", t2-t1

# not brute force (like nested loop with index scan)
t1 = time.time()
sortedind = np.argsort(bigArr)
sortedbigArr = bigArr[sortedind]
for i in range(len(smallArr)):
    i1 = bisect.bisect_left(sortedbigArr, smallArr[i])
    i2 = bisect.bisect_right(sortedbigArr, smallArr[i])
    inds = sortedind[i1:i2]
t2=time.time()
print "Non-brute", t2-t1

输出:

Brute 42.5278530121

Brute 42.5278530121

非短纤维1.57193303108

Non-brute 1.57193303108

这篇关于numpy数组:有效地找到匹配的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆