汇总Pandas DataFrame中的行 [英] Summarizing rows in a Pandas DataFrame
本文介绍了汇总Pandas DataFrame中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下几行:
ColumnID MenuID QuestionID ResponseCount RowID SourceColumnID SourceRowID SourceVariationID
22 -2 -2 319276487 28 3049400354 3049400356 3049400365 3049400365
23 -2 -2 319276487 31 3049400354 3049400356 3049400365 3049400365
24 -2 -2 319276487 37 3049400354 3049400356 3049400365 3049400365
25 -2 -2 319276487 28 3049400353 3049400357 3049400365 3049400365
26 -2 -2 319276487 45 3049400353 3049400357 3049400365 3049400365
27 -2 -2 319276487 46 3049400353 3049400357 3049400365 3049400365
28 -2 -2 319276487 26 3049400353 3049400358 3049400365 3049400365
29 -2 -2 319276487 33 3049400353 3049400358 3049400365 3049400365
30 -2 -2 319276487 39 3049400353 3049400358 3049400365 3049400365
31 -2 -2 319276487 26 3049400353 3049400359 3049400365 3049400365
我想压缩此数据帧,以便它通过RowID和SourceVariationID汇总ResponseCount中的总数.
And I want to squash this dataframe so that it sums up the total in ResponseCount by RowID and SourceVariationID.
例如:
ColumnID MenuID QuestionID ResponseCount RowID SourceColumnID SourceRowID SourceVariationID
22 -2 -2 319276487 96 3049400354 3049400356 3049400365 3049400365
23 -2 -2 319276487 243 3049400353 3049400356 3049400365
这是我到目前为止提出的:
This is what I've come up with so far:
(Pdb) new_df = df.groupby(['RowID', 'SourceVariationID', 'SourceRowID']).sum()
(Pdb) new_df['ColumnID'] = -2
(Pdb) new_df['MenuID'] = -2
(Pdb) pp new_df
ColumnID MenuID QuestionID ResponseCount SourceColumnID
RowID SourceVariationID SourceRowID
3031434948 3031434943 3031434943 -2 -2 3805083612 141 36377219262
3031434945 3031434945 -2 -2 4439264214 237 42440089136
[2 rows x 5 columns]
推荐答案
您可以执行以下操作:
print df
ColumnID MenuID QuestionID ResponseCount RowID SourceVariationID
0 -2 -2 319276487 28 3049400354 3049400365
1 -2 -2 319276487 31 3049400354 3049400365
2 -2 -2 319276487 37 3049400354 3049400365
3 -2 -2 319276487 28 3049400353 3049400365
4 -2 -2 319276487 45 3049400353 3049400365
5 -2 -2 319276487 46 3049400353 3049400365
6 -2 -2 319276487 26 3049400353 3049400365
7 -2 -2 319276487 33 3049400353 3049400365
8 -2 -2 319276487 39 3049400353 3049400365
9 -2 -2 319276487 26 3049400353 3049400365
def squash(group):
x = group.iloc[1,:].drop(['RowID','SourceVariationID'])
x['ResponseCount'] = group['ResponseCount'].sum()
return x
print df.groupby(['RowID','SourceVariationID']).apply(squash)
ColumnID MenuID QuestionID ResponseCount
RowID SourceVariationID
3049400353 3049400365 -2 -2 319276487 243
3049400354 3049400365 -2 -2 319276487 96
这篇关于汇总Pandas DataFrame中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文