pandas - 具有非数字值的pivot_table? (DataError:无数字类型聚合) [英] pandas - pivot_table with non-numeric values? (DataError: No numeric types to aggregate)

查看:1608
本文介绍了 pandas - 具有非数字值的pivot_table? (DataError:无数字类型聚合)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 将大熊猫导入pd 

df1 = pd.DataFrame({'index':range(8),
'variable1':[A,A,B,B,A B,B,A],
'variable2':[a,b,a,b,a,b b],
'variable3':[x,x,x,y,y,y,x,y],
'result':[on,off,off,on,on,off,off,on]})

df1。 pivot_table(values ='result',rows ='index',cols = ['variable1','variable2','variable3'])

但是我得到: DataError:没有数字类型来聚合



当我将结果值更改为数字时,按照预期的方式工作:

  df2 = pd.DataFrame({'index':range(8) ,
'variable1':[A,A,B,B,A,B,B,A],
'variable2' :[a,b,a,b,a,b,a,b],
'variable3':[x ,x,y,y,y,x,y],
'result':[1,0,0,1,1,0,0, 1]})

df2.pivot_table(values ='result',rows ='index',cols = ['variable1','variable2','variable3'])

我得到了我所需要的:

  variable1 AB 
variable2 abab
variable3 xyxyxy
index
0 1 NaN NaN NaN NaN NaN
1 NaN NaN 0 NaN NaN NaN
2 NaN NaN NaN NaN 0 NaN
3 NaN NaN NaN NaN NaN 1
4 NaN 1 NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 0
6 NaN NaN NaN NaN 0 NaN
7 NaN NaN NaN 1 NaN NaN

我知道我可以将字符串映射到数值,然后将操作,但也许有一个更优雅的解决方案?

解决方案

我的原始回复是基于熊猫0.14.1,从那时起在pivot_table函数中有很多改变(rows - >)索引,cols - >列...)



此外,我发布的原始lambda技巧似乎不再适用于Pandas 0.18。您必须提供减少功能(即使是最小,最大或均值)。但是即使这样似乎是不正确的 - 因为我们没有减少数据集,只是转换它....所以我看起来更加困难...

  import pandas as pd 

df1 = pd.DataFrame({'index':range(8),
'variable1':[A,A B,B,A,B,B,A],
'variable2':[a,b,a a,b,a,b],
'variable3':[x,x,x,y,y,y x,y],
'result':[on,off,off,on,on,off,off,on })

#这些是最后在多索引列中的列。
unfack_cols = ['variable1','variable2','variable3']

使用索引+您要堆叠的列设置数据索引,然后使用级别arg调用拆分。

  df1 .set_index(['index'] + unsack_cols).unstack(level = unfack_cols)

结果数据框是下面。




I'm trying to do a pivot of a table containing strings as results.

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

But I get: DataError: No numeric types to aggregate.

This works as intended when I change result values to numbers:

df2 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})

df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

And I get what I need:

variable1   A               B    
variable2   a       b       a   b
variable3   x   y   x   y   x   y
index                            
0           1 NaN NaN NaN NaN NaN
1         NaN NaN   0 NaN NaN NaN
2         NaN NaN NaN NaN   0 NaN
3         NaN NaN NaN NaN NaN   1
4         NaN   1 NaN NaN NaN NaN
5         NaN NaN NaN NaN NaN   0
6         NaN NaN NaN NaN   0 NaN
7         NaN NaN NaN   1 NaN NaN

I know I can map the strings to numerical values and then reverse the operation, but maybe there is a more elegant solution?

解决方案

My original reply was based on Pandas 0.14.1, and since then, many things changed in the pivot_table function (rows --> index, cols --> columns... )

Additionally, it appears that the original lambda trick I posted no longer works on Pandas 0.18. You have to provide a reducing function (even if it is min, max or mean). But even that seemed improper - because we are not reducing the data set, just transforming it.... So I looked harder at unstack...

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']

First, set an index on the data using the index + the columns you want to stack, then call unstack using the level arg.

df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)

Resulting dataframe is below.

这篇关于 pandas - 具有非数字值的pivot_table? (DataError:无数字类型聚合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆