Pandas groupby之后如何获得多个条件操作? [英] how to get multiple conditional operations after a Pandas groupby?

查看:470
本文介绍了Pandas groupby之后如何获得多个条件操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B' : [12,10,-2,-4,-2,5,8,7],
                   'C' : [-5,5,-20,0,1,5,4,-4]})

df
Out[12]: 
     A   B   C
0  foo  12  -5
1  bar  10   5
2  foo  -2 -20
3  bar  -4   0
4  foo  -2   1
5  bar   5   5
6  foo   8   4
7  foo   7  -4

在这里,我需要为A中的每个组 计算B中的元素 ,前提是C为非负数(即> = 0,这是基于另一列的条件).反之亦然.

但是,我下面的代码失败.

df.groupby('A').agg({'B': lambda x: x[x.C>0].sum(),
                     'C': lambda x: x[x.B>0].sum()})      

AttributeError: 'Series' object has no attribute 'B'

因此似乎首选apply(因为应用请参见我认为的所有数据框),但是不幸的是,我无法将字典与apply一起使用.所以我被困住了.有什么想法吗?

一个不太那么有效的解决方案是在运行groupby之前创建这些条件变量,但是我敢肯定,该解决方案不会利用 Pandas 的潜力>.

因此,例如,组barcolumn B的预期输出为

+10 (indeed C equals 5 and is >=0)
-4 (indeed C equals 0 and is >=0)
+5 = 11

另一个例子: 组foocolumn B

NaN (indeed C equals -5 so I dont want to consider the 12 value in B)
+ NaN   (indeed C= -20)
-2    (indeed C=1 so its positive)
+ 8
+NaN = 6

请注意,我使用NaNs而不是零,因为如果我们要放置零,那么除求和函数外的另一个函数将给出错误的结果(中位数).

换句话说,这是一个简单的条件总和,其中条件基于另一列. 谢谢!

解决方案

我认为您可以使用:

print df.groupby('A').agg({'B': lambda x: df.loc[x.index, 'C'][x >= 0].sum(), 
                           'C': lambda x: df.loc[x.index, 'B'][x >= 0].sum()})  
      C   B
A          
bar  11  10
foo   6  -5  

更好的理解是自定义功能,与上面的功能相同

def f(x):
    s = df.loc[x.index, 'C']
    return s[x>=0].sum()
def f1(x):
    s = df.loc[x.index, 'B']
    return s[x>=0].sum()


print df.groupby('A').agg({'B': f, 'C': f1})
      C   B
A          
bar  11  10
foo   6  -5 

root的解决方案很好,但是可以更好:

def my_func(row):
    b = row[row.C >= 0].B.sum()
    c = row[row.B >= 0].C.sum()
    return pd.Series({'C':b, 'B':c})

result = df.groupby('A').apply(my_func)
      C   B
A          
bar  11  10
foo   6  -5

consider the following example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B' : [12,10,-2,-4,-2,5,8,7],
                   'C' : [-5,5,-20,0,1,5,4,-4]})

df
Out[12]: 
     A   B   C
0  foo  12  -5
1  bar  10   5
2  foo  -2 -20
3  bar  -4   0
4  foo  -2   1
5  bar   5   5
6  foo   8   4
7  foo   7  -4

Here I need to compute, for each group in A, the sum of elements in B conditional on C being non-negative (i.e. being >=0, a condition based on another column). And vice-versa for C.

However, my code below fails.

df.groupby('A').agg({'B': lambda x: x[x.C>0].sum(),
                     'C': lambda x: x[x.B>0].sum()})      

AttributeError: 'Series' object has no attribute 'B'

So it seems apply would be preferred (because apply sees all the dataframe I think), but unfortunately I cannot use a dictionary with apply. So I am stuck. Any ideas?

One not-so-pretty not-so-efficient solution would be to create these conditional variables before running the groupby, but I am sure this solution does not use the potential of Pandas.

So, for instance, the expected output for the group bar and column B would be

+10 (indeed C equals 5 and is >=0)
-4 (indeed C equals 0 and is >=0)
+5 = 11

Another example: group foo and column B

NaN (indeed C equals -5 so I dont want to consider the 12 value in B)
+ NaN   (indeed C= -20)
-2    (indeed C=1 so its positive)
+ 8
+NaN = 6

Remark that I use NaNs instead of zero because another function than a sum would give wrong results (median) if we were to put zeros.

In other words, this is a simple conditional sum where the condition is based on another column. Thanks!

解决方案

I think you can use:

print df.groupby('A').agg({'B': lambda x: df.loc[x.index, 'C'][x >= 0].sum(), 
                           'C': lambda x: df.loc[x.index, 'B'][x >= 0].sum()})  
      C   B
A          
bar  11  10
foo   6  -5  

Better for understanding are custom function what is same as above:

def f(x):
    s = df.loc[x.index, 'C']
    return s[x>=0].sum()
def f1(x):
    s = df.loc[x.index, 'B']
    return s[x>=0].sum()


print df.groupby('A').agg({'B': f, 'C': f1})
      C   B
A          
bar  11  10
foo   6  -5 

EDIT:

root's solution is very nice, but it can be better:

def my_func(row):
    b = row[row.C >= 0].B.sum()
    c = row[row.B >= 0].C.sum()
    return pd.Series({'C':b, 'B':c})

result = df.groupby('A').apply(my_func)
      C   B
A          
bar  11  10
foo   6  -5

这篇关于Pandas groupby之后如何获得多个条件操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆