重复分组上的不同分组累计和 [英] Distinct Grouped Cumulative sum on duplicates group
本文介绍了重复分组上的不同分组累计和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:
Title start_time Duration Match
0 Item#1 2019-12-13 00:00:00.000 819.01 True
2 Item#1 2019-12-13 00:13:39.010 1205.25 True
4 Item#1 2019-12-13 00:33:44.260 972.80 True
6 Item#1 2019-12-13 00:49:57.060 602.23 False
9 Item#2 2019-12-13 00:59:59.290 1800.00 False
14 Item#2 2019-12-13 01:29:59.290 533.79 True
17 Item#2 2019-12-13 01:38:53.080 537.11 True
20 Item#2 2019-12-13 01:47:50.190 729.10 False
24 Item#3 2019-12-13 01:59:59.290 726.97 True
26 Item#3 2019-12-13 02:12:06.260 569.01 True
28 Item#3 2019-12-13 02:21:35.270 504.02 False
32 Item#4 2019-12-13 02:29:59.290 1800.00 False
36 Item#1 2019-12-13 02:59:59.290 776.98 True
38 Item#1 2019-12-13 03:12:56.270 1045.81 True
40 Item#1 2019-12-13 03:30:22.080 988.20 True
43 Item#1 2019-12-13 03:46:50.280 789.01 False
我想在工期列上运行累计总和,到目前为止,我正在使用以下代码行:
I want to run a cumulative sum on the duration columns, so far I'm using the following line of code:
df.groupby(['Title'])['Duration'].cumsum()
但是,我不想对时间上分开的标题项进行分组.看上面的示例,我不想将Item#1的两组分组.我该怎么做呢?
However I don't want to group title items that are separated in time. Looking at the example above, I don't want to group the two groups of Item#1. How do I do this?
推荐答案
我认为您需要按连续的组进行分组,这意味着 Item#1
的处理方式类似于两个组:
I believe you need grouping by consecutive groups, it means Item#1
is processing like two groups:
g = df['Title'].ne(df['Title'].shift()).cumsum()
df['new'] = df.groupby(g)['Duration'].cumsum()
print (df)
Title start_time Duration Match new
0 Item#1 2019-12-13 00:00:00.000 819.01 True 819.01
2 Item#1 2019-12-13 00:13:39.010 1205.25 True 2024.26
4 Item#1 2019-12-13 00:33:44.260 972.80 True 2997.06
6 Item#1 2019-12-13 00:49:57.060 602.23 False 3599.29
9 Item#2 2019-12-13 00:59:59.290 1800.00 False 1800.00
14 Item#2 2019-12-13 01:29:59.290 533.79 True 2333.79
17 Item#2 2019-12-13 01:38:53.080 537.11 True 2870.90
20 Item#2 2019-12-13 01:47:50.190 729.10 False 3600.00
24 Item#3 2019-12-13 01:59:59.290 726.97 True 726.97
26 Item#3 2019-12-13 02:12:06.260 569.01 True 1295.98
28 Item#3 2019-12-13 02:21:35.270 504.02 False 1800.00
32 Item#4 2019-12-13 02:29:59.290 1800.00 False 1800.00
36 Item#1 2019-12-13 02:59:59.290 776.98 True 776.98
38 Item#1 2019-12-13 03:12:56.270 1045.81 True 1822.79
40 Item#1 2019-12-13 03:30:22.080 988.20 True 2810.99
43 Item#1 2019-12-13 03:46:50.280 789.01 False 3600.00
详细信息:
您可以通过 进行比较具有
Series.cumsum
用于累积组:
You can compare by Series.shift
column with Series.ne
for not equal and add Series.cumsum
for cumulative groups:
print (df[['Title']].assign(shifted = df['Title'].shift(),
not_equal=df['Title'].ne(df['Title'].shift()),
g = df['Title'].ne(df['Title'].shift()).cumsum()))
Title shifted not_equal g
0 Item#1 NaN True 1
2 Item#1 Item#1 False 1
4 Item#1 Item#1 False 1
6 Item#1 Item#1 False 1
9 Item#2 Item#1 True 2
14 Item#2 Item#2 False 2
17 Item#2 Item#2 False 2
20 Item#2 Item#2 False 2
24 Item#3 Item#2 True 3
26 Item#3 Item#3 False 3
28 Item#3 Item#3 False 3
32 Item#4 Item#3 True 4
36 Item#1 Item#4 True 5
38 Item#1 Item#1 False 5
40 Item#1 Item#1 False 5
43 Item#1 Item#1 False 5
这篇关于重复分组上的不同分组累计和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文