Pandas 0.25.0:分类分组 [英] Pandas 0.25.0: groupby on categoricals
问题描述
我在使用上个月发布的 Pandas 0.25.0 时遇到了一些困难.
I have some difficulties on using Pandas 0.25.0, which is released last month.
考虑这个日期框架:
df = pd.DataFrame({
'A': pd.Series(['a', 'b', 'b', 'a'], dtype='category'),
'B': pd.Series(['m', 'o', 'o', 'o']),
'C': pd.Series([1, 2, 3, 4]),
})
假设我们要对前两列进行分组.结果数据框应该包含 3 行,因为组合 b
m
不存在.
Say we want to groupby on the first two columns. The resulting data frame should contain 3 rows, since the combination b
m
doesn't exist.
df.groupby(['A', 'B']).agg({'C': 'sum'})
在 Pandas 0.24.1 及更早版本中,这可以正常工作:
In Pandas 0.24.1 and earlier, this works fine:
C
A B
a m 1
o 4
b o 5
然而,在 Pandas 0.25.0 中这被破坏了:
However, in Pandas 0.25.0 this is broken:
C
A B
a m 1.0
o 4.0
b m NaN
o 5.0
我知道我可以通过将 observed=True
添加到 groupby 调用来抑制这种不需要的行为,但这在旧版本中不是必需的.我在 发行说明中找不到任何相关内容.
I know I can suppress this unwanted behaviour by adding observed=True
to the groupby call, but that was not neccessary in the old version. I cannot find anything related in the release notes.
怎么会?这是熊猫中的错误吗?我错过了什么吗?
How come? Is this a bug in pandas? Did I miss something?
推荐答案
感谢 ALollz 的评论 我想我知道发生了什么:
Thanks to the comment of ALollz I think I know what happend:
我(不知不觉地)依赖于 0.24 中的一个错误,这就是为什么更新到 0.25 破坏了我的代码.
I (unknowingly) relied on a bug in 0.24, and that is why the update to 0.25 broke my code.
这篇关于Pandas 0.25.0:分类分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!