为什么 pandas 不允许在groupby中使用分类列? [英] Why doesn't pandas allow a categorical column to be used in groupby?

查看：84 发布时间：2020/5/24 2:56:31 python pandas

本文介绍了为什么 pandas 不允许在groupby中使用分类列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一个自定义排序的DataFrame.为此，我使用了pandas.Categorical()，但是如果我随后在groupby中使用它的结果，则返回NAN值.

I would like to create a custom sorted DataFrame. To do this I have used pandas.Categorical() however if I then use the result of this in a groupby NAN values are returned.

# import the pandas module
import pandas as pd

# Create an example dataframe
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'],
        'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
        'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
        'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],}

df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield'])

df['Portfolio'] = pd.Categorical(df['Portfolio'],['C', 'B', 'A'])
df=df.sort_values('Portfolio')

dfs = df.groupby(['Date','Portfolio'], as_index =False).sum()

print(dfs)

                        Date    Portfolio   Duration   Yield
Date        Portfolio               
13/05/2016  C           NaN     NaN         NaN        NaN
            B           NaN     NaN         NaN        NaN
            A           NaN     NaN         NaN        NaN

为什么会这样，我该如何克服呢?

Why is this and how can I overcome this?

还提出了SettingWithCopyWarning，对于分类"是否有更好的成语?

Also SettingWithCopyWarning is raised is there a better idiom for Categorical?

推荐答案

as_index=False搞砸了.如果我只运行:

as_index=False is messing something up. If I run just:

dfs = df.groupby(['Date','Portfolio']).sum()

我得到:

                      Duration  Yield
Date       Portfolio                 
2016-05-13 C                18    6.0
           B                10   10.0
           A                 6    1.8

我不知道为什么会这样.可能是一个错误.

I don't know why this is. It may be a bug.

如果您真的想要没有索引的结果，而只将'Date'和'Portfolio'作为列，则使用'reset_index()'.

If you really wanted the result without the index and just have 'Date' and 'Portfolio' as columns then use 'reset_index()'.

dfs = df.groupby(['Date','Portfolio']).sum().reset_index()

         Date Portfolio  Duration  Yield
0  2016-05-13         C        18    6.0
1  2016-05-13         B        10   10.0
2  2016-05-13         A         6    1.8

这篇关于为什么 pandas 不允许在groupby中使用分类列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 pandas 不允许在groupby中使用分类列? [英] Why doesn't pandas allow a categorical column to be used in groupby?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么 pandas 不允许在groupby中使用分类列? [英] Why doesn&#39;t pandas allow a categorical column to be used in groupby?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么 pandas 不允许在groupby中使用分类列? [英] Why doesn't pandas allow a categorical column to be used in groupby?

登录关闭