重复分组上的不同分组累计和 [英] Distinct Grouped Cumulative sum on duplicates group

查看:71
本文介绍了重复分组上的不同分组累计和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

    Title             start_time     Duration  Match
0   Item#1 2019-12-13 00:00:00.000    819.01   True   
2   Item#1 2019-12-13 00:13:39.010   1205.25   True   
4   Item#1 2019-12-13 00:33:44.260    972.80   True   
6   Item#1 2019-12-13 00:49:57.060    602.23  False   
9   Item#2 2019-12-13 00:59:59.290   1800.00  False   
14  Item#2 2019-12-13 01:29:59.290    533.79   True   
17  Item#2 2019-12-13 01:38:53.080    537.11   True   
20  Item#2 2019-12-13 01:47:50.190    729.10  False   
24  Item#3 2019-12-13 01:59:59.290    726.97   True   
26  Item#3 2019-12-13 02:12:06.260    569.01   True   
28  Item#3 2019-12-13 02:21:35.270    504.02  False   
32  Item#4 2019-12-13 02:29:59.290   1800.00  False   
36  Item#1 2019-12-13 02:59:59.290    776.98   True   
38  Item#1 2019-12-13 03:12:56.270   1045.81   True   
40  Item#1 2019-12-13 03:30:22.080    988.20   True   
43  Item#1 2019-12-13 03:46:50.280    789.01  False    

我想在工期列上运行累计总和,到目前为止,我正在使用以下代码行:

I want to run a cumulative sum on the duration columns, so far I'm using the following line of code:

df.groupby(['Title'])['Duration'].cumsum()

但是,我不想对时间上分开的标题项进行分组.看上面的示例,我不想将Item#1的两组分组.我该怎么做呢?

However I don't want to group title items that are separated in time. Looking at the example above, I don't want to group the two groups of Item#1. How do I do this?

推荐答案

我认为您需要按连续的组进行分组,这意味着 Item#1 的处理方式类似于两个组:

I believe you need grouping by consecutive groups, it means Item#1 is processing like two groups:

g = df['Title'].ne(df['Title'].shift()).cumsum()
df['new'] = df.groupby(g)['Duration'].cumsum()

print (df)
     Title               start_time  Duration  Match      new
0   Item#1  2019-12-13 00:00:00.000    819.01   True   819.01
2   Item#1  2019-12-13 00:13:39.010   1205.25   True  2024.26
4   Item#1  2019-12-13 00:33:44.260    972.80   True  2997.06
6   Item#1  2019-12-13 00:49:57.060    602.23  False  3599.29
9   Item#2  2019-12-13 00:59:59.290   1800.00  False  1800.00
14  Item#2  2019-12-13 01:29:59.290    533.79   True  2333.79
17  Item#2  2019-12-13 01:38:53.080    537.11   True  2870.90
20  Item#2  2019-12-13 01:47:50.190    729.10  False  3600.00
24  Item#3  2019-12-13 01:59:59.290    726.97   True   726.97
26  Item#3  2019-12-13 02:12:06.260    569.01   True  1295.98
28  Item#3  2019-12-13 02:21:35.270    504.02  False  1800.00
32  Item#4  2019-12-13 02:29:59.290   1800.00  False  1800.00
36  Item#1  2019-12-13 02:59:59.290    776.98   True   776.98
38  Item#1  2019-12-13 03:12:56.270   1045.81   True  1822.79
40  Item#1  2019-12-13 03:30:22.080    988.20   True  2810.99
43  Item#1  2019-12-13 03:46:50.280    789.01  False  3600.00

详细信息:

您可以通过 进行比较具有

You can compare by Series.shift column with Series.ne for not equal and add Series.cumsum for cumulative groups:

print (df[['Title']].assign(shifted = df['Title'].shift(),
                            not_equal=df['Title'].ne(df['Title'].shift()),
                            g = df['Title'].ne(df['Title'].shift()).cumsum()))
     Title shifted  not_equal  g
0   Item#1     NaN       True  1
2   Item#1  Item#1      False  1
4   Item#1  Item#1      False  1
6   Item#1  Item#1      False  1
9   Item#2  Item#1       True  2
14  Item#2  Item#2      False  2
17  Item#2  Item#2      False  2
20  Item#2  Item#2      False  2
24  Item#3  Item#2       True  3
26  Item#3  Item#3      False  3
28  Item#3  Item#3      False  3
32  Item#4  Item#3       True  4
36  Item#1  Item#4       True  5
38  Item#1  Item#1      False  5
40  Item#1  Item#1      False  5
43  Item#1  Item#1      False  5

这篇关于重复分组上的不同分组累计和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆