Python pandas数据框数据透视仅适用于pivot_table(),而不适用于set_index()和unstack() [英] Python pandas dataframe pivot only works with pivot_table() but not with set_index() and unstack()

查看:549
本文介绍了Python pandas数据框数据透视仅适用于pivot_table(),而不适用于set_index()和unstack()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Python的Pandas数据框中转换以下类型的示例数据.我遇到了一些其他的stackoverflow答案,这些答案讨论了如何进行数据透视: pivot_table没有要聚合的数字类型

I am trying to pivot following type of sample data in Pandas dataframe in Python. I came across couple of other stackoverflow answers that discussed how to do the pivot: pivot_table No numeric types to aggregate

但是,当我使用pivot_table()时,我可以透视数据.但是当我使用set_index()unstack()时,出现以下错误:

However, when I use pivot_table(), I am able to pivot the data. But when I use set_index() and unstack(), I get following error:

AttributeError:'NoneType'对象没有属性'unstack'

AttributeError: 'NoneType' object has no attribute 'unstack'

样本数据:

id  responseTime    label   answers
ABC 2018-06-24  Category_1  [3]
ABC 2018-06-24  Category_2  [10]
ABC 2018-06-24  Category_3  [10]
DEF 2018-06-25  Category_1  [7]
DEF 2018-06-25  Category_8  [10]
GHI 2018-06-28  Category_3  [7]

所需的输出:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

这有效:

 df=pdDF.pivot_table(index=['items_id','responseTime'], columns='label', values='answers', aggfunc='first') 

这不起作用:

pdDF.set_index(['items_id','responseTime','label'], append=True, inplace=True).unstack('label')

我还使用了pdDF[pdDF.isnull().any(axis=1)]来确保答案栏中没有空数据.我还使用了append=False,但是发生了同样的错误.

I also used pdDF[pdDF.isnull().any(axis=1)] to make sure I don't have any NULL data in answers column. I also used append=False but same error happened.

从其他线程来看,set_index()unstack()似乎比pivot_table()更有效.我也不想使用pivot_table(),因为它需要聚合功能并且我的答案列中不包含数字数据.我不想使用默认值(mean()),所以最终使用了first(). 关于为什么一种方法有效而另一种无效的任何见解?

From other threads, it seems set_index() and unstack() are more efficient than pivot_table(). I also don't want to use pivot_table() because it requires aggregation function and my answers column doesn't contain numeric data. I didn't want to use default (mean()) so I ended up using first(). Any insights on why one method works and another doesn't?

推荐答案

AttributeError:'NoneType'对象没有属性'unstack'

AttributeError: 'NoneType' object has no attribute 'unstack'

set_index中使用inplace = True时,它会修改数据框.它不返回任何内容(None).因此,不能在None对象上使用unstack.

When you use inplace = True in set_index it modified the dataframe in place. It doesn't return anything(None). So you can't use unstack on None object.

inplace:布尔值,默认为False

inplace : boolean, default False

就地修改DataFrame(不要创建新对象)

Modify the DataFrame in place (do not create a new object)

使用:

df1 = pdDF.set_index(['items_id','responseTime','label']).unstack('label')    
print(df1)

# Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

这篇关于Python pandas数据框数据透视仅适用于pivot_table(),而不适用于set_index()和unstack()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆