计算列表的累计和,直到出现零 [英] Compute the cumulative sum of a list until a zero appears
问题描述
我有一个(长)列表,其中零和一随机出现:
I have a (long) list in which zeros and ones appear at random:
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
我想获取列表_b
- 列表总和,直到出现0为止
-
出现0的地方,在列表中保留0
- sum of the list up to where 0 appears
where 0 appears, retain 0 in the list
list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
我可以如下实现:
list_b = []
for i, x in enumerate(list_a):
if x == 0:
list_b.append(x)
else:
sum_value = 0
for j in list_a[i::-1]:
if j != 0:
sum_value += j
else:
break
list_b.append(sum_value)
print(list_b)
但是实际列表的长度很长.
but the actual list's length is very long.
因此,我想提高代码的速度. (如果无法读取)
So, I want to improve code for high speed. (if it is not readable)
我这样更改代码:
from itertools import takewhile
list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]
print(list_c)
但是速度不够快.如何以更有效的方式做到这一点?
But it is not fast enough. How can I do it in more efficient way?
推荐答案
您对此太想了.
选项1
您可以仅迭代索引并根据当前值是否为0
进行相应更新(计算累积总和).
Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0
or not.
data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
for i in range(1, len(data)):
if data[i]:
data[i] += data[i - 1]
也就是说,如果当前元素不为零,则将当前索引处的元素更新为当前值加上前一个索引处的值的总和.
That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
请注意,这将更新您的列表.如果您不希望创建副本,可以提前创建-new_data = data.copy()
并以相同的方式遍历new_data
.
Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy()
and iterate over new_data
in the same manner.
选项2
如果需要性能,可以使用pandas API.根据0
的位置查找组,并使用groupby
+ cumsum
计算按组的累计总和,类似于上面的内容:
Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0
s, and use groupby
+ cumsum
to compute group-wise cumulative sums, similar to above:
import pandas as pd
s = pd.Series(data)
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
性能
首先,设置-
data = data * 100000
s = pd.Series(data)
下一步,
%%timeit
new_data = data.copy()
for i in range(1, len(data)):
if new_data[i]:
new_data[i] += new_data[i - 1]
328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
然后,分别计时副本,
%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
因此,该副本实际上并不需要太多时间.最后,
So, the copy doesn't really take much time. Finally,
%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
pandas方法在概念上是线性的(就像其他方法一样),但是由于库的实现,其速度要恒定.
The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.
这篇关于计算列表的累计和,直到出现零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!