重新启动cumsum，如果cumsum大于值，则获取索引 [英] Restart cumsum and get index if cumsum more than value

查看：74 发布时间：2020/5/18 18:57:42 python pandas numpy

本文介绍了重新启动cumsum，如果cumsum大于值，则获取索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我有一个距离数组x=[1,2,1,3,3,2,1,5,1,1].

我想从x那里获得总和达到10的索引，在这种情况下，idx = [4,9].

I want to get the indices from x where cumsum reaches 10, in this case, idx=[4,9].

因此，满足条件后，cumsum重新启动.

So the cumsum restarts after the condition are met.

我可以使用循环来完成此操作，但是对于大型数组而言，循环速度很慢，我想知道是否可以用vectorized方式进行操作.

I can do it with a loop, but loops are slow for large arrays and I was wondering if I could do it in a vectorized way.

推荐答案

这里有numba和数组初始化-

Here's one with numba and array-initialization -

from numba import njit

@njit
def cumsum_breach_numba2(x, target, result):
    total = 0
    iterID = 0
    for i,x_i in enumerate(x):
        total += x_i
        if total >= target:
            result[iterID] = i
            iterID += 1
            total = 0
    return iterID

def cumsum_breach_array_init(x, target):
    x = np.asarray(x)
    result = np.empty(len(x),dtype=np.uint64)
    idx = cumsum_breach_numba2(x, target, result)
    return result[:idx]

时间

包括 @piRSquared's solutions ，并使用同一篇文章中的基准测试设置-

Including @piRSquared's solutions and using the benchmarking setup from the same post -

In [58]: np.random.seed([3, 1415])
    ...: x = np.random.randint(100, size=1000000).tolist()

# @piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop

# @piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop

# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop

Numba:追加与数组初始化

要仔细研究一下数组初始化是如何帮助的，这似乎是两个numba实现之间的最大区别，让我们将它们放在数组数据上，因为数组数据的创建本身就很耗时，而且它们都取决于在上面-

For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -

In [62]: x = np.array(x)

In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop

In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop

要强制输出具有自己的存储空间，我们可以进行复制.虽然不会有很大的改变-

To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -

In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop

这篇关于重新启动cumsum，如果cumsum大于值，则获取索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

重新启动cumsum，如果cumsum大于值，则获取索引 [英] Restart cumsum and get index if cumsum more than value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重新启动cumsum，如果cumsum大于值，则获取索引 [英] Restart cumsum and get index if cumsum more than value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭