如何有效地串联numpy中的许多arange调用? [英] How to efficiently concatenate many arange calls in numpy?

查看:81
本文介绍了如何有效地串联numpy中的许多arange调用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在cnt值的向量上对像numpy.arange(0, cnt_i)这样的调用进行向量化,并像以下代码段那样将结果连接起来:

I'd like to vectorize calls like numpy.arange(0, cnt_i) over a vector of cnt values and concatenate the results like this snippet:

import numpy
cnts = [1,2,3]
numpy.concatenate([numpy.arange(cnt) for cnt in cnts])

array([0, 0, 1, 0, 1, 2])

不幸的是,由于临时数组和列表理解循环,上述代码的内存效率非常低.

Unfortunately the code above is very memory inefficient due to the temporary arrays and list comprehension looping.

有没有办法在numpy中更有效地做到这一点?

Is there a way to do this more efficiently in numpy?

推荐答案

这是一个完全矢量化的函数:

Here's a completely vectorized function:

def multirange(counts):
    counts = np.asarray(counts)
    # Remove the following line if counts is always strictly positive.
    counts = counts[counts != 0]

    counts1 = counts[:-1]
    reset_index = np.cumsum(counts1)

    incr = np.ones(counts.sum(), dtype=int)
    incr[0] = 0
    incr[reset_index] = 1 - counts1

    # Reuse the incr array for the final result.
    incr.cumsum(out=incr)
    return incr

这里是@Developer答案的一种变体,它只调用一次arange:

Here's a variation of @Developer's answer that only calls arange once:

def multirange_loop(counts):
    counts = np.asarray(counts)
    ranges = np.empty(counts.sum(), dtype=int)
    seq = np.arange(counts.max())
    starts = np.zeros(len(counts), dtype=int)
    starts[1:] = np.cumsum(counts[:-1])
    for start, count in zip(starts, counts):
        ranges[start:start + count] = seq[:count]
    return ranges

这是作为功能编写的原始版本:

And here's the original version, written as a function:

def multirange_original(counts):
    ranges = np.concatenate([np.arange(count) for count in counts])
    return ranges

演示:

In [296]: multirange_original([1,2,3])
Out[296]: array([0, 0, 1, 0, 1, 2])

In [297]: multirange_loop([1,2,3])
Out[297]: array([0, 0, 1, 0, 1, 2])

In [298]: multirange([1,2,3])
Out[298]: array([0, 0, 1, 0, 1, 2])

使用更大数量的计数比较计时:

Compare timing using a larger array of counts:

In [299]: counts = np.random.randint(1, 50, size=50)

In [300]: %timeit multirange_original(counts)
10000 loops, best of 3: 114 µs per loop

In [301]: %timeit multirange_loop(counts)
10000 loops, best of 3: 76.2 µs per loop

In [302]: %timeit multirange(counts)
10000 loops, best of 3: 26.4 µs per loop

这篇关于如何有效地串联numpy中的许多arange调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆