加快列表之间的浮点数比较 [英] Speed up comparison of floats between lists
问题描述
我有一段代码可以执行以下操作:
I have a block of code which does the following:
- 从下面的索引为
indx
的列表b_lst
中获取浮点数 - 检查此浮点是否位于索引
i
的浮点与列表a_lst
中的下一个索引(c3>的浮点)之间 - 如果是,则将
indx
存储在第三个列表(c_lst
)的子列表中,该子列表的索引是a_lst
中左浮点的索引(即:) - 对于
b_lst
中的所有浮动重复
- take a float from a list,
b_lst
below, of indexindx
- check if this float is located between a float of index
i
and the next one (of indexi+1
) in lista_lst
- if it is, then store
indx
in a sub-list of a third list (c_lst
) where the index of that sub-list is the index of the left float ina_lst
(ie:i
) - repeat for all floats in
b_lst
这是一个MWE
,它显示代码的作用:
Here's a MWE
which shows what the code does:
import numpy as np
import timeit
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N).tolist()
# Data lists.
# Note that a_lst is sorted.
a_lst = np.sort(random_data(1000))
b_lst = random_data(5000)
# Fixed index value (int)
c = 25
def func():
# Create empty list with as many sub-lists as elements present
# in a_lst beyond the 'c' index.
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
# For each element in b_lst.
for indx,elem in enumerate(b_lst):
# For elements in a_lst beyond the 'c' index.
for i in range(len(a_lst[c:])-1):
# Check if 'elem' is between this a_lst element
# and the next.
if a_lst[c+i] < elem <= a_lst[c+(i+1)]:
# If it is then store the index of 'elem' ('indx')
# in the 'i' sub-list of c_lst.
c_lst[i].append(indx)
return c_lst
print func()
# time function.
func_time = timeit.timeit(func, number=10)
print func_time
此代码应能正常工作,但由于它会减慢其余代码的速度,因此我确实需要提高其性能.
This code works as it should but I really need to improve its performance since it's slowing down the rest of my code.
添加
这是基于接受答案的优化功能.这很丑陋,但是可以完成工作.
This is the optimized function based on the accepted answer. It's quite ugly but it gets the job done.
def func_opt():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
for elem in c_opt:
if 0<elem<len(a_lst[c:]):
c_lst[elem-1] = np.where(c_opt==elem)[0].tolist()
return c_lst
在我的测试中,这比原始功能快7倍.
In my tests this is ~7x faster than the original function.
添加2
不使用np.where
的速度要快得多:
Much faster not using np.where
:
def func_opt2():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
for indx,elem in enumerate(c_opt):
if 0<elem<len(a_lst[c:]):
c_lst[elem-1].append(indx)
return c_lst
这比原始功能快130倍.
This is ~130x faster than the original function.
添加3
按照 jtaylor 的建议,我将np.searchsorted
的结果转换为带有.tolist()
的列表:>
Following jtaylor's advice I converted the result of np.searchsorted
to a list with .tolist()
:
def func_opt3():
c_lst = [[] for _ in range(len(a_lst[c:])-1)]
c_opt = np.searchsorted(a_lst[c:], b_lst, side='left').tolist()
for indx,elem in enumerate(c_opt):
if 0<elem<len(a_lst[c:]):
c_lst[elem-1].append(indx)
return c_lst
这比原始功能快约470倍.
This is ~470x faster than the original function.
推荐答案
You want to take a look at numpy's searchsorted
. Calling
np.searchsorted(a_lst, b_lst, side='right')
将返回一个索引数组,该索引的长度与b_lst
相同,保持在a_lst
中应插入它们以保持顺序的位置.这将非常快,因为它使用二进制搜索,并且循环发生在C中.然后,您可以创建带有花式索引的子数组,例如:
will return an array of indices, the same length as b_lst
, holding before which item in a_lst
they should be inserted to preserve order. It will be very fast, as it uses binary search and the looping happens in C. You could then create your subarrays with fancy indexing, e.g.:
>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226, 0.40043875, 0.62398925, 0.40097674, 0.58765603,
0.14045264, 0.16990249, 0.78264088, 0.51507254, 0.31808327,
0.03895417, 0.92130027])
>>> b[c == 1]
array([ 1.34599709, 1.42645778, 1.13025996, 1.20096723, 1.75724448,
1.87447058, 1.23422399, 1.37807553, 1.64118058, 1.53740299])
这篇关于加快列表之间的浮点数比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!