计算列表的累计和,直到出现零 [英] Compute the cumulative sum of a list until a zero appears

查看:76
本文介绍了计算列表的累计和,直到出现零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(长)列表,其中零和一随机出现:

I have a (long) list in which zeros and ones appear at random:

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

我想获取列表_b

  • 列表总和,直到出现0为止
  • 出现0的地方,在列表中保留0

  • sum of the list up to where 0 appears
  • where 0 appears, retain 0 in the list

list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

我可以如下实现:

list_b = []
for i, x in enumerate(list_a):
    if x == 0:
        list_b.append(x)
    else:
        sum_value = 0
        for j in list_a[i::-1]:
            if j != 0:
                sum_value += j
            else:
                break
        list_b.append(sum_value)
print(list_b)

但是实际列表的长度很长.

but the actual list's length is very long.

因此,我想提高代码的速度. (如果无法读取)

So, I want to improve code for high speed. (if it is not readable)

我这样更改代码:

from itertools import takewhile
list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]
print(list_c)

但是速度不够快.如何以更有效的方式做到这一点?

But it is not fast enough. How can I do it in more efficient way?

推荐答案

您对此太想了.

选项1
您可以仅迭代索引并根据当前值是否为0进行相应更新(计算累积总和).

Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.

data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

for i in range(1, len(data)):
    if data[i]:  
        data[i] += data[i - 1] 

也就是说,如果当前元素不为零,则将当前索引处的元素更新为当前值加上前一个索引处的值的总和.

That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.

print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

请注意,这将更新您的列表.如果您不希望创建副本,可以提前创建-new_data = data.copy()并以相同的方式遍历new_data.

Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.

选项2
如果需要性能,可以使用pandas API.根据0的位置查找组,并使用groupby + cumsum计算按组的累计总和,类似于上面的内容:

Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:

import pandas as pd

s = pd.Series(data)    
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()

print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]


性能

首先,设置-

data = data * 100000
s = pd.Series(data)

下一步,

%%timeit
new_data = data.copy()
for i in range(1, len(data)):
    if new_data[i]:  
        new_data[i] += new_data[i - 1]

328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

然后,分别计时副本,

%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

因此,该副本实际上并不需要太多时间.最后,

So, the copy doesn't really take much time. Finally,

%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

pandas方法在概念上是线性的(就像其他方法一样),但是由于库的实现,其速度要恒定.

The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.

这篇关于计算列表的累计和,直到出现零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆