加快列表之间的浮点数比较 [英] Speed up comparison of floats between lists

查看:84
本文介绍了加快列表之间的浮点数比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段代码可以执行以下操作:

I have a block of code which does the following:

  • 从下面的索引为indx的列表b_lst中获取浮点数
  • 检查此浮点是否位于索引i的浮点与列表a_lst中的下一个索引(c3>的浮点)之间
  • 如果是,则将indx存储在第三个列表(c_lst)的子列表中,该子列表的索引是a_lst中左浮点的索引(即:)
  • 对于b_lst
  • 中的所有浮动重复
  • take a float from a list, b_lst below, of index indx
  • check if this float is located between a float of index i and the next one (of index i+1) in list a_lst
  • if it is, then store indx in a sub-list of a third list (c_lst) where the index of that sub-list is the index of the left float in a_lst (ie: i)
  • repeat for all floats in b_lst

这是一个MWE,它显示代码的作用:

Here's a MWE which shows what the code does:

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N).tolist()

# Data lists.
# Note that a_lst is sorted.
a_lst = np.sort(random_data(1000))
b_lst = random_data(5000)
# Fixed index value (int)
c = 25

def func():
    # Create empty list with as many sub-lists as elements present
    # in a_lst beyond the 'c' index.
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]

    # For each element in b_lst.
    for indx,elem in enumerate(b_lst):

        # For elements in a_lst beyond the 'c' index.
        for i in range(len(a_lst[c:])-1):

            # Check if 'elem' is between this a_lst element
            # and the next.
            if a_lst[c+i] < elem <= a_lst[c+(i+1)]:

                # If it is then store the index of 'elem' ('indx')
                # in the 'i' sub-list of c_lst.
                c_lst[i].append(indx)

    return c_lst

print func()
# time function.
func_time = timeit.timeit(func, number=10)
print func_time

此代码应能正常工作,但由于它会减慢其余代码的速度,因此我确实需要提高其性能.

This code works as it should but I really need to improve its performance since it's slowing down the rest of my code.

添加

这是基于接受答案的优化功能.这很丑陋,但是可以完成工作.

This is the optimized function based on the accepted answer. It's quite ugly but it gets the job done.

def func_opt():
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]
    c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
    for elem in c_opt:
        if 0<elem<len(a_lst[c:]):
            c_lst[elem-1] = np.where(c_opt==elem)[0].tolist()
    return c_lst

在我的测试中,这比原始功能快7倍.

In my tests this is ~7x faster than the original function.

添加2

不使用np.where的速度要快得多:

Much faster not using np.where:

def func_opt2():
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]
    c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
    for indx,elem in enumerate(c_opt):
        if 0<elem<len(a_lst[c:]):
            c_lst[elem-1].append(indx)
    return c_lst

这比原始功能快130倍.

This is ~130x faster than the original function.

添加3

按照 jtaylor 的建议,我将np.searchsorted的结果转换为带有.tolist()的列表:

Following jtaylor's advice I converted the result of np.searchsorted to a list with .tolist():

def func_opt3():
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]
    c_opt = np.searchsorted(a_lst[c:], b_lst, side='left').tolist()
    for indx,elem in enumerate(c_opt):
        if 0<elem<len(a_lst[c:]):
            c_lst[elem-1].append(indx)
    return c_lst

这比原始功能快约470倍.

This is ~470x faster than the original function.

推荐答案

您想看看numpy的

You want to take a look at numpy's searchsorted. Calling

np.searchsorted(a_lst, b_lst, side='right')

将返回一个索引数组,该索引的长度与b_lst相同,保持在a_lst中应插入它们以保持顺序的位置.这将非常快,因为它使用二进制搜索,并且循环发生在C中.然后,您可以创建带有花式索引的子数组,例如:

will return an array of indices, the same length as b_lst, holding before which item in a_lst they should be inserted to preserve order. It will be very fast, as it uses binary search and the looping happens in C. You could then create your subarrays with fancy indexing, e.g.:

>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226,  0.40043875,  0.62398925,  0.40097674,  0.58765603,
        0.14045264,  0.16990249,  0.78264088,  0.51507254,  0.31808327,
        0.03895417,  0.92130027])
>>> b[c == 1]
array([ 1.34599709,  1.42645778,  1.13025996,  1.20096723,  1.75724448,
        1.87447058,  1.23422399,  1.37807553,  1.64118058,  1.53740299])

这篇关于加快列表之间的浮点数比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆