按单列汇总分组的Pandas数据框 [英] Sum grouped Pandas dataframe by single column

查看:72
本文介绍了按单列汇总分组的Pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框:

test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1

我想(1)按两列("GroupID"和"Sample")分组,(2)每组对"Value"求和,(3)每组仅在"SampleMeta"中保留唯一值.显示所需的结果("GroupID"和"Sample"作为索引):

I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' per group, and (3) retain only unique values in 'SampleMeta' per group. The desired result ('GroupID' and 'Sample' as index) is shown:

                SampleMeta  Value
GroupID Sample                       
1       S1      S1_meta      2
2       S2      S2_meta      1 

df.groupby()和.sum()方法很接近,但是.sum()在组中的值"列中连接相同的值.结果,"S1_meta"值被复制.

df.groupby() and the .sum() method get close, but .sum() concatenates identical values in the 'Values' column within a group. As a result, the 'S1_meta' value is duplicated.

g=test.groupby(['GroupID','Sample'])
print g.sum()

                SampleMeta      Value
GroupID Sample                       
1       S1      S1_metaS1_meta  2
2       S2      S2_meta         1 

是否有一种使用groupby()和相关方法来获得所需结果的方法?将每个组的总值"与单独的"SampleMeta" DataFrame合并是可行的,但是必须有一个更优雅的解决方案.

Is there a way to achieve the desired result using groupby() and associated methods? Merging the summed 'Value' per group with a separate 'SampleMeta' DataFrame works but there must be a more elegant solution.

推荐答案

好吧,您可以将SampleMeta作为分组依据的一部分:

Well, you can include SampleMeta as part of the groupby:

print test.groupby(['GroupID','Sample','SampleMeta']).sum()

                           Value
GroupID Sample SampleMeta       
1       S1     S1_meta         2
2       S2     S2_meta         1

如果不想在完成后将SampleMeta作为索引的一部分,则可以按以下方式进行修改:

If you don't want SampleMeta as part of the index when done you could modify it as follows:

print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)

               SampleMeta  Value
GroupID Sample                  
1       S1        S1_meta      2
2       S2        S2_meta      1

这仅在['GroupID','Sample']SampleMeta内没有变化的情况下才有效.当然,如果['GroupID','Sample']内有差异,那么您可能完全将SampleMeta从groupby/sum中排除:

This will only work right if there is no variation within SampleMeta for ['GroupID','Sample']. Of course, If there was variation within ['GroupID','Sample'] then you probably to exclude SampleMeta from the groupby/sum entirely:

print test.groupby(['GroupID','Sample'])['Value'].sum()

GroupID  Sample
1        S1        2
2        S2        1

这篇关于按单列汇总分组的Pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆