根据特定条件对大 pandas 的累积总和进行分组 [英] Groupby cumulative sum in pandas based on specific condition

查看:77
本文介绍了根据特定条件对大 pandas 的累积总和进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示.

I have a data frame as shown below.

B_ID   No_Show   Session  slot_num   Patient_count
    1     0.4       S1        1          1
    2     0.3       S1        2          1
    3     0.8       S1        3          1
    4     0.3       S1        3          2
    5     0.6       S1        4          1
    6     0.8       S1        5          1
    7     0.9       S1        5          2
    8     0.4       S1        5          3
    9     0.6       S1        5          4
    12    0.9       S2        1          1
    13    0.5       S2        1          2
    14    0.3       S2        2          1
    15    0.7       S2        3          1
    20    0.7       S2        4          1
    16    0.6       S2        5          1
    17    0.8       S2        5          2
    19    0.3       S2        5          3

从上面,我想按会话查找累积的No_show

From the above I would like to find the cumulative No_show by Session

df['Cum_No_show'] = df.groupby(['Session'])['No_Show'].cumsum()

不,我们得到

B_ID   No_Show   Session  slot_num   Patient_count  Cumulative_No_show
    1     0.4       S1        1          1          0.4
    2     0.3       S1        2          1          0.7
    3     0.8       S1        3          1          1.5
    4     0.3       S1        3          2          1.8
    5     0.6       S1        4          1          2.4
    6     0.8       S1        5          1          3.2
    7     0.9       S1        5          2          4.1
    8     0.4       S1        5          3          4.5
    9     0.6       S1        5          4          5.1
    12    0.9       S2        1          1          0.9
    13    0.5       S2        1          2          1.4
    14    0.3       S2        2          1          1.7
    15    0.7       S2        3          1          2.4
    20    0.7       S2        4          1          3.1
    16    0.6       S2        5          1          3.7
    17    0.8       S2        5          2          4.5
    19    0.3       S2        5          3          4.8

从上面我想创建一个新的列,如下所示:

From the above I would like create a new column named as below

U_slot_num = Updated slot number

U_No_show = Updated cumulative no show

每当累积不显示> 0.6时,将下一个slot_num更改为与当前位置相同的值,并将U_No_show更新为减1,如预期输出所示.

Whenever cumulative no show > 0.6 change the next slot_num as same as current one and update U_No_show as subtracting 1 as shown in expected output.

预期输出:

No_Show  Session slot_num Patient_count Cum_No_show U_slot_num  U_No_show
 0.4       S1        1          1          0.4         1         0.4
 0.3       S1        2          1          0.7         2         0.7
 0.8       S1        3          1          1.5         2         0.5
 0.3       S1        3          2          1.8         3         0.8      
 0.6       S1        4          1          2.4         3         0.4
 0.8       S1        5          1          3.2         4         1.2
 0.9       S1        5          2          4.1         4         0.2
 0.4       S1        5          3          4.5         5         0.6
 0.6       S1        5          4          5.1         6         1.2
 0.9       S2        1          1          0.9         1         0.9
 0.5       S2        1          2          1.4         1         0.4
 0.3       S2        2          1          1.7         2         0.7
 0.7       S2        3          1          2.4         2         0.4
 0.7       S2        4          1          3.1         3         1.1
 0.6       S2        5          1          3.7         3         0.7
 0.8       S2        5          2          4.5         3         0.5
 0.3       S2        5          3          4.8         4         0.8

推荐答案

与以后的问题类似,我认为您需要创建一个函数来返回两列,然后返回groupby.apply.如果我正确理解了您要如何增加U_slot_num,则可以执行以下操作:

So similar to your question later on, i think you need to create a function to return your two columns then groupby.apply. And if I understand correctly how you want to increment U_slot_num, then you can do:

def create_u_columns (ser):
    arr_ns = ser.to_numpy()
    arr_sn = np.ones(len(ser))
    for i in range(len(arr_ns)-1):
        if arr_ns[i]>0.6:
            # remove 1 to u_no_show
            arr_ns[i+1:] -= 1
        else:
            # increment u_slot_num
            arr_sn[i+1:] += 1
    #return a dataframe with both columns
    return pd.DataFrame({'U_slot_num':arr_sn, 'U_No_show': arr_ns}, index=ser.index)

df[['U_slot_num', 'U_No_show']] = df.groupby(['Session'])['Cumulative_No_show'].apply(create_u_columns)

您将得到:

print (df)
    B_ID  No_Show Session  slot_num  Patient_count  Cumulative_No_show  \
0      1      0.4      S1         1              1                 0.4   
1      2      0.3      S1         2              1                 0.7   
2      3      0.8      S1         3              1                 1.5   
3      4      0.3      S1         3              2                 1.8   
4      5      0.6      S1         4              1                 2.4   
5      6      0.8      S1         5              1                 3.2   
6      7      0.9      S1         5              2                 4.1   
7      8      0.4      S1         5              3                 4.5   
8      9      0.6      S1         5              4                 5.1   
9     12      0.9      S2         1              1                 0.9   
10    13      0.5      S2         1              2                 1.4   
11    14      0.3      S2         2              1                 1.7   
12    15      0.7      S2         3              1                 2.4   
13    20      0.7      S2         4              1                 3.1   
14    16      0.6      S2         5              1                 3.7   
15    17      0.8      S2         5              2                 4.5   
16    19      0.3      S2         5              3                 4.8   

    U_slot_num  U_No_show  
0          1.0        0.4  
1          2.0        0.7  
2          2.0        0.5  
3          3.0        0.8  
4          3.0        0.4  
5          4.0        1.2  
6          4.0        1.1  
7          4.0        0.5  
8          5.0        1.1  
9          1.0        0.9  
10         1.0        0.4  
11         2.0        0.7  
12         2.0        0.4  
13         3.0        1.1  
14         3.0        0.7  
15         3.0        0.5  
16         4.0        0.8 

这篇关于根据特定条件对大 pandas 的累积总和进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆