pandas 数据框列总和并收集结果 [英] Pandas data frame sum of column and collecting the results

查看:72
本文介绍了 pandas 数据框列总和并收集结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下数据框:

import pandas as pd
p1 = {'name': 'willy', 'age': 11, 'interest': "Lego"}
p2 = {'name': 'willy', 'age': 11, 'interest': "games"}
p3 = {'name': 'zoe', 'age': 9, 'interest': "cars"}
df = pd.DataFrame([p1, p2, p3])
df

    age interest    name
0   11  Lego        willy
1   11  games       willy
2   9   cars        zoe

我想知道每个人的兴趣总和,并让每个人在列表中只显示一次.我执行以下操作:

I want to know the sum of interests of each person and let each person only show once in the list. I do the following:

Interests = df[['age', 'name', 'interest']].groupby(['age' , 'name']).count()
Interests.reset_index(inplace=True)
Interests.sort('interest', ascending=False, inplace=True)
Interests

    age name    interest
1   11  willy   2
0   9   zoe     1

这有效,但是我感觉自己做错了.现在,我正在使用兴趣"列显示我的总和值,这是可以的,但是就像我说的那样,我希望这是一种更好的方法.

This works but I have the feeling that I'm doing it wrong. Now I'm using the column 'interest' to display my sum values which is okay but like I said I expect there to be a nicer way to do this.

我看到了很多有关熊猫计算/总和的问题,但对我而言,我忽略了重复项"的部分很关键.

I saw many questions about counting/sum in Pandas but for me the part where I leave out the 'duplicates' is key.

推荐答案

您可以使用大小(每个组的长度),而不是计算组中每一列的非NaN实体.

You can use size (the length of each group), rather than count, the non-NaN enties in each column of the group.

In [11]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size()
Out[11]:
age  name
9    zoe      1
11   willy    2
dtype: int64

In [12]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size().reset_index(name='count')
Out[12]:
   age   name  count
0    9    zoe      1
1   11  willy      2

这篇关于 pandas 数据框列总和并收集结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆