Pandas:如何根据其他列值的条件对列进行求和? [英] Pandas: How to sum columns based on conditional of other column values?

查看:209
本文介绍了Pandas:如何根据其他列值的条件对列进行求和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下熊猫数据框.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

     dog      A         B           C
0     dog1    0.787575  0.159330    0.053095
1     dog10   0.770698  0.169487    0.059815
2     dog11   0.792689  0.152043    0.055268
3     dog12   0.785066  0.160361    0.054573
4     dog13   0.795455  0.150464    0.054081
5     dog14   0.794873  0.150700    0.054426
..    ....
8     dog19   0.811585  0.140207    0.048208
9     dog2    0.797202  0.152033    0.050765
10    dog20   0.801607  0.145137    0.053256
11    dog21   0.792689  0.152043    0.055268
    ....

我通过汇总列 "A""B""C" 来创建一个新列,如下所示:

I create a new column by summing columns "A", "B", "C" as follows:

df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

现在我想根据条件执行此操作,即 if "A" <0.78 然后创建一个新的求和列 df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1).否则,该值应为零.

Now I would like to do this based on a conditional, i.e. if "A" < 0.78 then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1). Otherwise, the value should be zero.

如何创建这样的条件语句?

How does one create conditional statements like this?

我的想法是使用

df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

但是,这不起作用,我无法指定轴.

However, this doesn't work and I'm not able to specify axis.

如何根据其他列的值创建列?

How do you create a column based on the values of other columns?

您也可以为每个 df['dog'] == 'dog2' 创建列 dog2_sum,即

You could also do something like for each df['dog'] == 'dog2', create column dog2_sum, i.e.

 df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

但我的方法不正确.

`

推荐答案

下面应该可以了,这里我们屏蔽满足条件的df,这会将NaN设置为条件所在的行不满足,所以我们在新的 col 上调用 fillna:

The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
          A         B         C
0  0.197334  0.707852 -0.443475
1 -1.063765 -0.914877  1.585882
2  0.899477  1.064308  1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858  0.777523 -0.453747

In [73]:    
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

另一种方法是调用 wheresum 结果上,当条件不满足时,这需要一个值参数来返回:

Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

这篇关于Pandas:如何根据其他列值的条件对列进行求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆