基于pandas中的groupby或循环条件划分列 [英] divide a column based on groupby or looping conditions in pandas

查看:54
本文介绍了基于pandas中的groupby或循环条件划分列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a data frame as shown below

   B_ID   No_Show   Session  slot_num   Patient_count
    1     0.2       S1        1          1
    2     0.3       S1        2          1
    3     0.8       S1        3          1
    4     0.3       S1        3          2
    5     0.6       S1        4          1
    6     0.8       S1        5          1
    7     0.9       S1        5          2
    8     0.4       S1        5          3
    9     0.6       S1        5          4
    12    0.9       S2        1          1
    13    0.5       S2        1          2
    14    0.3       S2        2          1
    15    0.7       S2        3          1
    20    0.7       S2        4          1
    16    0.6       S2        5          1
    17    0.8       S2        5          2
    19    0.3       S2        5          3

哪里

No_Show = 没有出现的概率

No_Show = Probability of no show

假设

阈值概率 = 0.2每个时段的持续时间 = 30(分钟)

threshold probability = 0.2 Duration for each slot = 30 (minutes)

从上面我想计算下面的数据框

From the above I would like calculate below data frame

第一步

根据 Session、slot_number 和 Patient_count 对数据帧进行排序

sort the dataframe based on Session, slot_number and Patient_count

df = df.sort_values(['Session', 'slot_num', 'Patient_count'], ascending=False)

步骤 2 使用以下条件计算截止

step 2 Calculate the cut off by using below conditions

如果患者计数 = 1如果患者计数 = 1,则将 No_show 除以阈值概率

if patient_count = 1 Divide No_show by threshold probability if patient_count = 1

Example for B_ID = 3, Patient_count = 1, cut_off = 0.8/0.2 = 4

否则如果患者计数 = 2将先前的 1 No_Show 与当前的 No_show 相乘并除以阈值)

else if patient_count = 2 multiply previously 1 No_Show with current No_show and divide with threshold)

Example for B_ID = 4, Patient_count = 2, cut_off = (0.3*0.8)/0.2 = 1.2

否则如果患者计数 = 3将先前的 2 No_Show 与当前的 No_show 相乘并除以阈值

else if patient_count = 3 multiply previously 2 No_Show with current No_show and divide with threshold

Example for B_ID = 8, Patient_count = 3, cut_off = (0.4*0.9*0.8)/0.2 = 1.44

等等

预期输出:

      B_ID   No_Show   Session  slot_num   Patient_count  Cut_off
        1     0.2       S1        1          1             1
        2     0.3       S1        2          1             1.5
        3     0.8       S1        3          1             4
        4     0.3       S1        3          2             1.2
        5     0.6       S1        4          1             3
        6     0.8       S1        5          1             4
        7     0.9       S1        5          2             3.6
        8     0.4       S1        5          3             1.44
        9     0.6       S1        5          4             0.864
        12    0.9       S2        1          1             4.5
        13    0.5       S2        1          2             2.25
        14    0.3       S2        2          1             1.5
        15    0.7       S2        3          1             3.5
        20    0.7       S2        4          1             3.5
        16    0.6       S2        5          1             3
        17    0.8       S2        5          2             2.4
        19    0.3       S2        5          3             0.72

推荐答案

使用 GroupBy.cumprod 并除以 probabilitySeries.div:

Use GroupBy.cumprod and divide by probability by Series.div:

probability = 0.2
df['new'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(probability)
print (df)
    B_ID  No_Show Session  slot_num  Patient_count    new
0      1      0.2      S1         1              1  1.000
1      2      0.3      S1         2              1  1.500
2      3      0.8      S1         3              1  4.000
3      4      0.3      S1         3              2  1.200
4      5      0.6      S1         4              1  3.000
5      6      0.8      S1         5              1  4.000
6      7      0.9      S1         5              2  3.600
7      8      0.4      S1         5              3  1.440
8      9      0.6      S1         5              4  0.864
9     12      0.9      S2         1              1  4.500
10    13      0.5      S2         1              2  2.250
11    14      0.3      S2         2              1  1.500
12    15      0.7      S2         3              1  3.500
13    20      0.7      S2         4              1  3.500
14    16      0.6      S2         5              1  3.000
15    17      0.8      S2         5              2  2.400
16    19      0.3      S2         5              3  0.720

这篇关于基于pandas中的groupby或循环条件划分列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆