关于使用Python在Pandas中使用多列进行groupby操作的含义感到困惑 [英] Confused about meaning of groupby operation with multiple columns with Pandas in Python

查看:56
本文介绍了关于使用Python在Pandas中使用多列进行groupby操作的含义感到困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的数据框,功能:

Here are my dataframe, function:

df = pd.DataFrame({
    'G': 'x x y y'.split(), 
    'C': [1, 2, 1, 2], 
    'D': [2, 2, 1, 1]})

def CD(df):
    return df['C'] * df['D']

这是我的数据框的样子:

Here is what my dataframe looks like:

   G  C  D
0  x  1  2
1  x  2  2
2  y  1  1
3  y  2  1

我跑步时

df.groupby('G').apply(CD)

我希望得到x和y的总和才能得到

I expected that it would sum over x and y to get

   G  C  D
0  x  3  4
1  y  3  2

然后,我希望它将C和D相乘得到

Then, I expected it to multiply C and D to get

x   12
y   6

但是,我知道了

G   
x  0    2
   1    4
y  2    1
   3    2

[2,4,1,2]的新列看起来与我简单地运行所获得的内容没什么不同

That new column of [2, 4, 1, 2] doesn't look any different than what I would have obtained if I simply ran

df['C'] * df['D']

很显然,我对groupby的功能感到困惑.什么是"df.groupby('G').apply(CD)"在我的示例中做什么?

Clearly, I am confused about what groupby does. What is "df.groupby('G').apply(CD)" doing in my example?

推荐答案

Groupby不会求和.尝试套用apply(sum)并将结果发送到您的函数.

Groupby does not do the sum. Try apply(sum) and sent the results to your function.

>> CD(df.groupby('G')[['C','D']].apply(sum))

G
x    12
y     6
dtype: int64

这篇关于关于使用Python在Pandas中使用多列进行groupby操作的含义感到困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆