查找大于第二个列表中元素的列表索引的有效解决方案 [英] Efficient solution to find list indices greater than elements in a second list
问题描述
此问题与此相关:第一个大于x的Python列表索引?
我有一个(排序的)浮点列表,我想找到一个超出第二个列表的每个值的第一个索引
I have a (sorted) list of floats, and I want to find the first index that exceeds each value of a second list
例如
l=[0.2,0.3,0.7,0.9]
m=[0.25,0.6]
如果m是一个浮点数,我会使用它:
if m were a float I would use this:
bisect.bisect_left(l,m)
但是对于m是列表的情况,这会失败,我只能考虑采用列表理解:
But for the case where m is a list this fails, and I can only think to employ a list comprehension:
[bisect.bisect_left(l,i) for i in m]
给出:
[1, 2]
可以工作,但是在我的实际示例中,我想通过避免列表理解来加快大型列表的速度,因为我的测试表明这是瓶颈"操作(我之前曾说过我怀疑它太慢了).有没有一种方法可以使用例如numpy还是一种改进的算法(因为只需要遍历列表中的一个)?
which works, but I want to speed it up for large lists in my real example by avoiding the list comprehension as my tests showed this was the "bottleneck" operation (I earlier stated I suspected it was too slow). Is there a way to efficiently do this using a vectorized function in e.g. numpy or an improved algorithm (as only one traverse of the list is required)?
推荐答案
So I found there is a numpy function to perform this task, np.searchsorted. which is much faster than the use of list comprehensions.
1.标准列表理解:
这是我第一次尝试解决方案
this was my first attempt at a solution
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,i) for i in n]"
200个循环,最好是5个循环:每个循环1.61毫秒
2.缩短搜索循环
这是@lenik提供的解决方案
This was the solution kindly provided by @lenik
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,n[0])]" "for i in n[1:]:" " r.append(bisect.bisect_left(h,i,r[-1]))"
200个循环,最好是5个循环:每个循环1.6毫秒
与列表理解几乎没有什么不同,我对此感到有些惊讶...
Hardly different from the list comprehension which I was somewhat surprised about...
3. Numpy searchsorted
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=np.searchsorted(h,n)"
10000次循环,最好是5次:每个循环33.6微秒
传世最快.
这篇关于查找大于第二个列表中元素的列表索引的有效解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!