汇总Pandas DataFrame中的行 [英] Summarizing rows in a Pandas DataFrame

查看:76
本文介绍了汇总Pandas DataFrame中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下几行:

    ColumnID  MenuID  QuestionID  ResponseCount       RowID  SourceColumnID  SourceRowID  SourceVariationID
22        -2      -2   319276487             28  3049400354      3049400356   3049400365         3049400365
23        -2      -2   319276487             31  3049400354      3049400356   3049400365         3049400365
24        -2      -2   319276487             37  3049400354      3049400356   3049400365         3049400365
25        -2      -2   319276487             28  3049400353      3049400357   3049400365         3049400365
26        -2      -2   319276487             45  3049400353      3049400357   3049400365         3049400365
27        -2      -2   319276487             46  3049400353      3049400357   3049400365         3049400365
28        -2      -2   319276487             26  3049400353      3049400358   3049400365         3049400365
29        -2      -2   319276487             33  3049400353      3049400358   3049400365         3049400365
30        -2      -2   319276487             39  3049400353      3049400358   3049400365         3049400365
31        -2      -2   319276487             26  3049400353      3049400359   3049400365         3049400365

我想压缩此数据帧,以便它通过RowID和SourceVariationID汇总ResponseCount中的总数.

And I want to squash this dataframe so that it sums up the total in ResponseCount by RowID and SourceVariationID.

例如:

    ColumnID  MenuID  QuestionID  ResponseCount       RowID  SourceColumnID  SourceRowID  SourceVariationID
22        -2      -2   319276487             96  3049400354      3049400356   3049400365         3049400365
23        -2      -2   319276487             243  3049400353      3049400356   3049400365

这是我到目前为止提出的:

This is what I've come up with so far:

(Pdb) new_df = df.groupby(['RowID', 'SourceVariationID', 'SourceRowID']).sum()                                                                          
(Pdb) new_df['ColumnID'] = -2
(Pdb) new_df['MenuID'] = -2
(Pdb) pp new_df
                                          ColumnID  MenuID  QuestionID  ResponseCount  SourceColumnID
RowID      SourceVariationID SourceRowID                                                             
3031434948 3031434943        3031434943         -2      -2  3805083612            141     36377219262
           3031434945        3031434945         -2      -2  4439264214            237     42440089136

[2 rows x 5 columns]

推荐答案

您可以执行以下操作:

print df
   ColumnID  MenuID  QuestionID  ResponseCount       RowID  SourceVariationID
0        -2      -2   319276487             28  3049400354         3049400365
1        -2      -2   319276487             31  3049400354         3049400365
2        -2      -2   319276487             37  3049400354         3049400365
3        -2      -2   319276487             28  3049400353         3049400365
4        -2      -2   319276487             45  3049400353         3049400365
5        -2      -2   319276487             46  3049400353         3049400365
6        -2      -2   319276487             26  3049400353         3049400365
7        -2      -2   319276487             33  3049400353         3049400365
8        -2      -2   319276487             39  3049400353         3049400365
9        -2      -2   319276487             26  3049400353         3049400365


def squash(group):
    x = group.iloc[1,:].drop(['RowID','SourceVariationID'])
    x['ResponseCount'] = group['ResponseCount'].sum()
    return x

print df.groupby(['RowID','SourceVariationID']).apply(squash)

                             ColumnID  MenuID  QuestionID  ResponseCount
RowID      SourceVariationID                                             
3049400353 3049400365               -2      -2   319276487            243
3049400354 3049400365               -2      -2   319276487             96

这篇关于汇总Pandas DataFrame中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆