按单列汇总分组的Pandas数据框 [英] Sum grouped Pandas dataframe by single column
问题描述
我有一个熊猫数据框:
test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1
我想(1)按两列("GroupID"和"Sample")分组,(2)每组对"Value"求和,(3)每组仅在"SampleMeta"中保留唯一值.显示所需的结果("GroupID"和"Sample"作为索引):
I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' per group, and (3) retain only unique values in 'SampleMeta' per group. The desired result ('GroupID' and 'Sample' as index) is shown:
SampleMeta Value
GroupID Sample
1 S1 S1_meta 2
2 S2 S2_meta 1
df.groupby()和.sum()方法很接近,但是.sum()在组中的值"列中连接相同的值.结果,"S1_meta"值被复制.
df.groupby() and the .sum() method get close, but .sum() concatenates identical values in the 'Values' column within a group. As a result, the 'S1_meta' value is duplicated.
g=test.groupby(['GroupID','Sample'])
print g.sum()
SampleMeta Value
GroupID Sample
1 S1 S1_metaS1_meta 2
2 S2 S2_meta 1
是否有一种使用groupby()和相关方法来获得所需结果的方法?将每个组的总值"与单独的"SampleMeta" DataFrame合并是可行的,但是必须有一个更优雅的解决方案.
Is there a way to achieve the desired result using groupby() and associated methods? Merging the summed 'Value' per group with a separate 'SampleMeta' DataFrame works but there must be a more elegant solution.
推荐答案
好吧,您可以将SampleMeta
作为分组依据的一部分:
Well, you can include SampleMeta
as part of the groupby:
print test.groupby(['GroupID','Sample','SampleMeta']).sum()
Value
GroupID Sample SampleMeta
1 S1 S1_meta 2
2 S2 S2_meta 1
如果不想在完成后将SampleMeta
作为索引的一部分,则可以按以下方式进行修改:
If you don't want SampleMeta
as part of the index when done you could modify it as follows:
print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)
SampleMeta Value
GroupID Sample
1 S1 S1_meta 2
2 S2 S2_meta 1
这仅在['GroupID','Sample']
的SampleMeta
内没有变化的情况下才有效.当然,如果['GroupID','Sample']
内有差异,那么您可能完全将SampleMeta
从groupby/sum中排除:
This will only work right if there is no variation within SampleMeta
for ['GroupID','Sample']
. Of course, If there was variation within ['GroupID','Sample']
then you probably to exclude SampleMeta
from the groupby/sum entirely:
print test.groupby(['GroupID','Sample'])['Value'].sum()
GroupID Sample
1 S1 2
2 S2 1
这篇关于按单列汇总分组的Pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!