根据特定条件对大 pandas 的累积总和进行分组 [英] Groupby cumulative sum in pandas based on specific condition
问题描述
我有一个数据框,如下所示.
I have a data frame as shown below.
B_ID No_Show Session slot_num Patient_count
1 0.4 S1 1 1
2 0.3 S1 2 1
3 0.8 S1 3 1
4 0.3 S1 3 2
5 0.6 S1 4 1
6 0.8 S1 5 1
7 0.9 S1 5 2
8 0.4 S1 5 3
9 0.6 S1 5 4
12 0.9 S2 1 1
13 0.5 S2 1 2
14 0.3 S2 2 1
15 0.7 S2 3 1
20 0.7 S2 4 1
16 0.6 S2 5 1
17 0.8 S2 5 2
19 0.3 S2 5 3
从上面,我想按会话查找累积的No_show
From the above I would like to find the cumulative No_show by Session
df['Cum_No_show'] = df.groupby(['Session'])['No_Show'].cumsum()
不,我们得到
B_ID No_Show Session slot_num Patient_count Cumulative_No_show
1 0.4 S1 1 1 0.4
2 0.3 S1 2 1 0.7
3 0.8 S1 3 1 1.5
4 0.3 S1 3 2 1.8
5 0.6 S1 4 1 2.4
6 0.8 S1 5 1 3.2
7 0.9 S1 5 2 4.1
8 0.4 S1 5 3 4.5
9 0.6 S1 5 4 5.1
12 0.9 S2 1 1 0.9
13 0.5 S2 1 2 1.4
14 0.3 S2 2 1 1.7
15 0.7 S2 3 1 2.4
20 0.7 S2 4 1 3.1
16 0.6 S2 5 1 3.7
17 0.8 S2 5 2 4.5
19 0.3 S2 5 3 4.8
从上面我想创建一个新的列,如下所示:
From the above I would like create a new column named as below
U_slot_num = Updated slot number
U_No_show = Updated cumulative no show
每当累积不显示> 0.6时,将下一个slot_num更改为与当前位置相同的值,并将U_No_show更新为减1,如预期输出所示.
Whenever cumulative no show > 0.6 change the next slot_num as same as current one and update U_No_show as subtracting 1 as shown in expected output.
预期输出:
No_Show Session slot_num Patient_count Cum_No_show U_slot_num U_No_show
0.4 S1 1 1 0.4 1 0.4
0.3 S1 2 1 0.7 2 0.7
0.8 S1 3 1 1.5 2 0.5
0.3 S1 3 2 1.8 3 0.8
0.6 S1 4 1 2.4 3 0.4
0.8 S1 5 1 3.2 4 1.2
0.9 S1 5 2 4.1 4 0.2
0.4 S1 5 3 4.5 5 0.6
0.6 S1 5 4 5.1 6 1.2
0.9 S2 1 1 0.9 1 0.9
0.5 S2 1 2 1.4 1 0.4
0.3 S2 2 1 1.7 2 0.7
0.7 S2 3 1 2.4 2 0.4
0.7 S2 4 1 3.1 3 1.1
0.6 S2 5 1 3.7 3 0.7
0.8 S2 5 2 4.5 3 0.5
0.3 S2 5 3 4.8 4 0.8
推荐答案
与以后的问题类似,我认为您需要创建一个函数来返回两列,然后返回groupby.apply
.如果我正确理解了您要如何增加U_slot_num,则可以执行以下操作:
So similar to your question later on, i think you need to create a function to return your two columns then groupby.apply
. And if I understand correctly how you want to increment U_slot_num, then you can do:
def create_u_columns (ser):
arr_ns = ser.to_numpy()
arr_sn = np.ones(len(ser))
for i in range(len(arr_ns)-1):
if arr_ns[i]>0.6:
# remove 1 to u_no_show
arr_ns[i+1:] -= 1
else:
# increment u_slot_num
arr_sn[i+1:] += 1
#return a dataframe with both columns
return pd.DataFrame({'U_slot_num':arr_sn, 'U_No_show': arr_ns}, index=ser.index)
df[['U_slot_num', 'U_No_show']] = df.groupby(['Session'])['Cumulative_No_show'].apply(create_u_columns)
您将得到:
print (df)
B_ID No_Show Session slot_num Patient_count Cumulative_No_show \
0 1 0.4 S1 1 1 0.4
1 2 0.3 S1 2 1 0.7
2 3 0.8 S1 3 1 1.5
3 4 0.3 S1 3 2 1.8
4 5 0.6 S1 4 1 2.4
5 6 0.8 S1 5 1 3.2
6 7 0.9 S1 5 2 4.1
7 8 0.4 S1 5 3 4.5
8 9 0.6 S1 5 4 5.1
9 12 0.9 S2 1 1 0.9
10 13 0.5 S2 1 2 1.4
11 14 0.3 S2 2 1 1.7
12 15 0.7 S2 3 1 2.4
13 20 0.7 S2 4 1 3.1
14 16 0.6 S2 5 1 3.7
15 17 0.8 S2 5 2 4.5
16 19 0.3 S2 5 3 4.8
U_slot_num U_No_show
0 1.0 0.4
1 2.0 0.7
2 2.0 0.5
3 3.0 0.8
4 3.0 0.4
5 4.0 1.2
6 4.0 1.1
7 4.0 0.5
8 5.0 1.1
9 1.0 0.9
10 1.0 0.4
11 2.0 0.7
12 2.0 0.4
13 3.0 1.1
14 3.0 0.7
15 3.0 0.5
16 4.0 0.8
这篇关于根据特定条件对大 pandas 的累积总和进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!