Python:Pandas:Groupby&数据透视表缺少行 [英] Python: Pandas: Groupby & Pivot Tables are missing rows
问题描述
我有一个数据框,该数据框由个人(其ID在其中),活动和相应的分数组成.我正在尝试按学生和活动类型分组时获得分数的总和.我可以使用以下方法做到这一点:
I have a dataframe composed of individuals (their ID's in), activities, and corresponding scores. I'm trying to get the sum of the scores when grouping by the student and an activity type. I can do this with the following:
data_detail.pivot_table(["total_scored","total_scored_omitted"], index = ["id","activity"], aggfunc="sum")
data_detail.groupby(["id","activity"]).sum()
但是,当我通过查看一个典型的学生来检查结果时:
However, when I check the results by looking at a typical student:
data_detail[data_detail["id"]== 41824840].sort_values("activity")
我看到groupby/pivot表中缺少该给定学生的一些活动.如何确保最终的groupby/pivot表完整且不丢失任何值?
I see that there are some activities listed for that given student which are missing from the groupby/pivot table. How can I ensure the final groupby/pivot table is complete and isn't missing any values?
推荐答案
问题是分数的 数据类型不一致(并且是浮点数!).
The problem is that the data type for the scores wasn't consistent (and a float at that!).
其中一些是字符串.当我将所有分数转换为浮点数后,缺少的活动就出现了.
Some of them were strings. After I converted all of the scores into floats, the missing activities showed up.
另一个好处是,数据类型统一,可以使计算更快!
As an added benefit, having the datatypes be uniform, made the calculation much faster!
这篇关于Python:Pandas:Groupby&数据透视表缺少行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!