Pandas/Python复杂的条件和 [英] Pandas/Python complex, conditional sum

查看:88
本文介绍了Pandas/Python复杂的条件和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df_have

CONTRACT ID  AMT     REL_NUM HDR_NUM
1         1   0.00    0      1    
1         2   33.85   1      2    
1         3   0.72    2      2    
1         4   0.87    1      1    
1         5   1.67    1      2  

df_want

CONTRACT ID  AMT     REL_NUM HDR_NUM CALCULATION
1         1   0.00    0      1        (0.00+33.85+0.87+1.67)
1         2   33.85   1      2        (33.85+0.72)
1         3   0.72    2      2        (33.85+0.72)
1         4   0.87    1      1        (0.00+33.85+0.87+1.67)
1         5   1.67    1      2        (33.85+0.72)

df_getting

df_getting

CONTRACT ID  AMT     REL_NUM HDR_NUM CALCULATION
1         1   0.00    0      1        21.75
1         2   33.85   1      2        2.00
1         3   0.72    2      2        19.75
1         4   0.87    1      1        33.85
1         5   1.67    1      2        0.00

尝试创建新列"CALCULATION",但逻辑有些棘手.计算应为AMT字段的总和,具体取决于CONTRACT,ID,REL_NUM和HDR_NUM字段.

Trying to create a new column 'CALCULATION' but the logic is a bit tricky. The calculation should be the SUM of the AMT field dependent on the CONTRACT, ID, REL_NUM and HDR_NUM fields.

第1步-检查HDR_NUM字段并获取ID = HDR_NUM并且CONTRACT字段相同的相应AMT值

Step 1- Check the HDR_NUM field and get the corresponding AMT value where ID=HDR_NUM AND the CONTRACT field is the same

第2步-在所有AMT字段中添加相同合同的REL_NUM = HDR_NUM

Step 2- Add in all AMT fields where the REL_NUM= HDR_NUM for the same CONTRACT

对于第一行,这将对ID = 2,3的AMT字段求和,对于CONTRACT = 1的AMT字段求和. 对于第6行,这将对ID = 2,4(对于CONTRACT = 2)的AMT字段求和

For the first row, this would be summing the AMT fields for ID= 2,3 and 4 for CONTRACT =1. For row 6, this would be summing the AMT fields for ID=2,4 for CONTRACT=2

一个警告是不应进行重复计数(即对于第6行,将ID = 2,4的AMT字段总计为CONTRACT = 2的,请勿重复计算ID = 2)

One caveat is that double counting should not occur (i.e. for row 6, sum the AMT fields for ID=2,4 for CONTRACT=2, do not double count ID=2)

推荐答案

IIUC,

def F(s):
    rule1   = s[['ID', 'AMT']].set_index('ID').to_dict()['AMT']
    rule2   = s[['REL_NUM', 'AMT']].groupby('REL_NUM').sum().to_dict()['AMT']
    s1 = s['HDR_NUM'].astype(int).map(rule1).fillna(0)
    s2 = s['HDR_NUM'].astype(int).map(rule2).fillna(0)
    return s1 + s2

df['CALCULATION'] = df.groupby('CONTRACT').apply(F).values.ravel()


    CONTRACT    ID  AMT     REL_NUM HDR_NUM CALCULATION
0   1           1   0.00    0       1       36.39
1   1           2   33.85   1       2       34.57
2   1           3   0.72    2       2       34.57
3   1           4   0.87    1       1       36.39
4   1           5   1.67    1       2       34.57

这篇关于Pandas/Python复杂的条件和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆