用平坦的子范围替换数字列表 [英] Replace a list of numbers with flat sub-ranges

查看:61
本文介绍了用平坦的子范围替换数字列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个数字列表,如下所示:

Given a list of numbers, like this:

lst = [0, 10, 15, 17]

我想要一个列表,其中包含i -> i + 3中所有ii -> i + 3元素.如果范围重叠,我希望将它们合并.

I'd like a list that has elements from i -> i + 3 for all i in lst. If there are overlapping ranges, I'd like them merged.

因此,对于上面的示例,我们首先得到:

So, for the example above, we first get:

[0, 1, 2, 3,     10, 11, 12, 13,     15, 16, 17, 18,   17, 18, 19, 20]

但是对于最后两个组,范围重叠,因此在合并它们时,您具有:

But for the last 2 groups, the ranges overlap, so upon merging them, you have:

[0, 1, 2, 3,     10, 11, 12, 13,     15, 16, 17, 18,     19, 20]

这是我想要的输出.

这就是我想到的:

from collections import OrderedDict

res = list(OrderedDict.fromkeys([y for x in lst for y in range(x, x + 4)]).keys())
print(res) = [0, 1, 2, 3, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20]

但是,这很慢(10000 loops, best of 3: 56 µs per loop).如果可能,我想要一个numpy解决方案,或者比这更快的python解决方案.

However, this is slow (10000 loops, best of 3: 56 µs per loop). I'd like a numpy solution if possible, or a python solution that's faster than this.

推荐答案

方法1:一种基于

Approach #1 : One approach based on broadcasted summation and then using np.unique to get unique numbers -

np.unique(np.asarray(lst)[:,None] + np.arange(4))

方法2::另一个基于广播的求和,然后进行掩蔽-

Approach #2 : Another based on broadcasted summation and then masking -

def mask_app(lst, interval_len = 4):
    arr = np.array(lst)
    r = np.arange(interval_len)
    ranged_vals = arr[:,None] + r
    a_diff = arr[1:] - arr[:-1]
    valid_mask = np.vstack((a_diff[:,None] > r, np.ones(interval_len,dtype=bool)))
    return ranged_vals[valid_mask]

运行时测试

原始方法-

from collections import OrderedDict
def org_app(lst):
    list(OrderedDict.fromkeys([y for x in lst for y in range(x, x + 4)]).keys())

时间-

In [409]: n = 10000

In [410]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()

In [411]: %timeit org_app(lst)
     ...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
     ...: %timeit mask_app(lst, interval_len = 4)
     ...: 
10 loops, best of 3: 32.7 ms per loop
1000 loops, best of 3: 1.03 ms per loop
1000 loops, best of 3: 671 µs per loop

In [412]: n = 100000

In [413]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()

In [414]: %timeit org_app(lst)
     ...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
     ...: %timeit mask_app(lst, interval_len = 4)
     ...: 
1 loop, best of 3: 350 ms per loop
100 loops, best of 3: 14.7 ms per loop
100 loops, best of 3: 9.73 ms per loop

转换为array的瓶颈似乎是两种发布方法的瓶颈,尽管此后看来回报很丰厚.只是为了了解最后一个数据集在转换上花费的时间-

The bottleneck with the two posted approaches seems like is with the conversion to array, though that seems to be paying off well afterwards. Just to give a sense of the time spent on the conversion for the last dataset -

In [415]: %timeit np.array(lst)
100 loops, best of 3: 5.6 ms per loop

这篇关于用平坦的子范围替换数字列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆