pandas 数据框列总和并收集结果 [英] Pandas data frame sum of column and collecting the results
问题描述
给出以下数据框:
import pandas as pd
p1 = {'name': 'willy', 'age': 11, 'interest': "Lego"}
p2 = {'name': 'willy', 'age': 11, 'interest': "games"}
p3 = {'name': 'zoe', 'age': 9, 'interest': "cars"}
df = pd.DataFrame([p1, p2, p3])
df
age interest name
0 11 Lego willy
1 11 games willy
2 9 cars zoe
我想知道每个人的兴趣总和,并让每个人在列表中只显示一次.我执行以下操作:
I want to know the sum of interests of each person and let each person only show once in the list. I do the following:
Interests = df[['age', 'name', 'interest']].groupby(['age' , 'name']).count()
Interests.reset_index(inplace=True)
Interests.sort('interest', ascending=False, inplace=True)
Interests
age name interest
1 11 willy 2
0 9 zoe 1
这有效,但是我感觉自己做错了.现在,我正在使用兴趣"列显示我的总和值,这是可以的,但是就像我说的那样,我希望这是一种更好的方法.
This works but I have the feeling that I'm doing it wrong. Now I'm using the column 'interest' to display my sum values which is okay but like I said I expect there to be a nicer way to do this.
我看到了很多有关熊猫计算/总和的问题,但对我而言,我忽略了重复项"的部分很关键.
I saw many questions about counting/sum in Pandas but for me the part where I leave out the 'duplicates' is key.
推荐答案
您可以使用大小(每个组的长度),而不是计算组中每一列的非NaN实体.
You can use size (the length of each group), rather than count, the non-NaN enties in each column of the group.
In [11]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size()
Out[11]:
age name
9 zoe 1
11 willy 2
dtype: int64
In [12]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size().reset_index(name='count')
Out[12]:
age name count
0 9 zoe 1
1 11 willy 2
这篇关于 pandas 数据框列总和并收集结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!