Python:Pandas:Groupby&数据透视表缺少行 [英] Python: Pandas: Groupby & Pivot Tables are missing rows

查看:105
本文介绍了Python:Pandas:Groupby&数据透视表缺少行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,该数据框由个人(其ID在其中),活动和相应的分数组成.我正在尝试按学生和活动类型分组时获得分数的总和.我可以使用以下方法做到这一点:

I have a dataframe composed of individuals (their ID's in), activities, and corresponding scores. I'm trying to get the sum of the scores when grouping by the student and an activity type. I can do this with the following:

data_detail.pivot_table(["total_scored","total_scored_omitted"], index = ["id","activity"], aggfunc="sum")

data_detail.groupby(["id","activity"]).sum()

但是,当我通过查看一个典型的学生来检查结果时:

However, when I check the results by looking at a typical student:

data_detail[data_detail["id"]== 41824840].sort_values("activity")

我看到groupby/pivot表中缺少该给定学生的一些活动.如何确保最终的groupby/pivot表完整且不丢失任何值?

I see that there are some activities listed for that given student which are missing from the groupby/pivot table. How can I ensure the final groupby/pivot table is complete and isn't missing any values?

推荐答案

问题是分数的 数据类型不一致(并且是浮点数!).

The problem is that the data type for the scores wasn't consistent (and a float at that!).

其中一些是字符串.当我将所有分数转换为浮点数后,缺少的活动就出现了.

Some of them were strings. After I converted all of the scores into floats, the missing activities showed up.

另一个好处是,数据类型统一,可以使计算更快!

As an added benefit, having the datatypes be uniform, made the calculation much faster!

这篇关于Python:Pandas:Groupby&数据透视表缺少行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆