为什么 pandas 不允许在groupby中使用分类列? [英] Why doesn't pandas allow a categorical column to be used in groupby?
问题描述
我想创建一个自定义排序的DataFrame.为此,我使用了pandas.Categorical()
,但是如果我随后在groupby中使用它的结果,则返回NAN
值.
I would like to create a custom sorted DataFrame. To do this I have used pandas.Categorical()
however if I then use the result of this in a groupby NAN
values are returned.
# import the pandas module
import pandas as pd
# Create an example dataframe
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'],
'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],}
df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield'])
df['Portfolio'] = pd.Categorical(df['Portfolio'],['C', 'B', 'A'])
df=df.sort_values('Portfolio')
dfs = df.groupby(['Date','Portfolio'], as_index =False).sum()
print(dfs)
Date Portfolio Duration Yield
Date Portfolio
13/05/2016 C NaN NaN NaN NaN
B NaN NaN NaN NaN
A NaN NaN NaN NaN
为什么会这样,我该如何克服呢?
Why is this and how can I overcome this?
还提出了SettingWithCopyWarning
,对于分类"是否有更好的成语?
Also SettingWithCopyWarning
is raised is there a better idiom for Categorical?
推荐答案
as_index=False
搞砸了.如果我只运行:
as_index=False
is messing something up. If I run just:
dfs = df.groupby(['Date','Portfolio']).sum()
我得到:
Duration Yield
Date Portfolio
2016-05-13 C 18 6.0
B 10 10.0
A 6 1.8
我不知道为什么会这样.可能是一个错误.
I don't know why this is. It may be a bug.
如果您真的想要没有索引的结果,而只将'Date'
和'Portfolio'
作为列,则使用'reset_index()'
.
If you really wanted the result without the index and just have 'Date'
and 'Portfolio'
as columns then use 'reset_index()'
.
dfs = df.groupby(['Date','Portfolio']).sum().reset_index()
Date Portfolio Duration Yield
0 2016-05-13 C 18 6.0
1 2016-05-13 B 10 10.0
2 2016-05-13 A 6 1.8
这篇关于为什么 pandas 不允许在groupby中使用分类列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!