在Python中创建严格增加的列表的最快方法 [英] Fastest way to create strictly increasing lists in Python

查看:73
本文介绍了在Python中创建严格增加的列表的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找出在Python中实现以下目标的最有效方法:

I would like to find out what is the most efficient way to achieve the following in Python:

假设我们有两个列表ab,它们的长度相等,最多包含1e7个元素. 但是,为了便于说明,我们可以考虑以下内容:

Suppose we have two lists a and b which are of equal length and contain up to 1e7 elements. However, for the ease of illustration we may consider the following:

a = [2, 1, 2, 3, 4, 5, 4, 6, 5, 7, 8, 9, 8,10,11]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15]

目标是从a创建严格单调的列表a_new,而仅使用具有相同值的采样点的第一个采样点. 在a中必须删除的相同索引也应该在b中删除,这样最终结果将是:

The goal is to create a strictly monotonic list a_new from a whereas only the first sample point of sample points with identical values is used. The same indices that have to be deleted in a should also be deleted in b such that the final result will be:

a_new = [2, 3, 4, 5, 6, 7, 8, 9,10,11]
b_new = [1, 4, 5, 6, 8,10,11,12,14,15]

当然,这可以使用计算量大的for循环来完成,但是由于数据量巨大,这是不合适的.

Of course this can be done using computationally expensive for loops which is however not suitable due to the huge amount of data.

任何建议都非常感谢.

推荐答案

使用numba

import numba

def psi(A):
    a_cummax = np.maximum.accumulate(A)
    a_new, idx = np.unique(a_cummax, return_index=True)
    return idx

def foo(arr):
    aux=np.maximum.accumulate(arr)
    flag = np.concatenate(([True], aux[1:] != aux[:-1]))
    return np.nonzero(flag)[0]

@numba.jit
def f(A):
    m = A[0]
    a_new, idx = [m], [0]
    for i, a in enumerate(A[1:], 1):
        if a > m:
            m = a
            a_new.append(a)
            idx.append(i)
    return idx


定时


timing

%timeit f(a)
The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.83 µs per loop

%timeit foo(a)
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.35 µs per loop

%timeit psi(a)
The slowest run took 9.66 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop

这篇关于在Python中创建严格增加的列表的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆