将pandas DataFrame设置为正确的格式:DataError:没有要聚合的数字类型 [英] Pivot a pandas DataFrame to be the correct format: `DataError: No numeric types to aggregate`

查看:209
本文介绍了将pandas DataFrame设置为正确的格式:DataError:没有要聚合的数字类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我要处理的pandas DataFrame:

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd

data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2", ...],
        "labels": ["A", "B", "C", "A", "B", "C", "D", ...],
        "count": [5, 1, 8, 3, 731, 189, 9, ...]}

df = pd.DataFrame(data)

print(df)
>>>   grouping            labels       count
0        item1             A            5
1        item1             B            1
2        item1             C            8
3        item2             A            3
4        item2             B          731
5        item2             C          189
6        item2             D            9
7        ...               ...         ....

我想将此数据框展开"为以下格式:

I would like to "unfold" this dataframe into the following format:

grouping    A    B    C    D
item1       5    1    8    3
item2       3    731  189  9
....        ........

这怎么办?我认为这会起作用:

How would one do this? I would think that this would work:

pd.pivot_table(df,index=["grouping", "labels"]

但是出现以下错误:

DataError: No numeric types to aggregate

推荐答案

有四种惯用的pandas方法.

  • 分组列之间没有重复项.不需要聚合
    • pivot
    • set_index
    • No duplicates among grouping columns. Does not require aggregation
      • pivot
      • set_index
      • pivot_table
      • groupby
      • pivot_table
      • groupby

      pivot

      df.pivot('grouping', 'labels', 'count')
      

      set_index

      df.set_index(['grouping', 'labels'])['count'].unstack()
      

      pivot_table

      df.pivot_table('count', 'grouping', 'labels')
      

      groupby

      df.groupby(['grouping', 'labels'])['count'].sum().unstack()
      

      所有产量

      labels      A      B      C    D
      grouping                        
      item1     5.0    1.0    8.0  NaN
      item2     3.0  731.0  189.0  9.0
      

      定时

      timing

      使用groupbyset_indexpivot_table方法,您可以轻松地使用fill_value=0

      With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0

      df.pivot_table('count', 'grouping', 'labels', fill_value=0)
      
      df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
      
      df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
      

      所有产量

      labels    A    B    C  D
      grouping                
      item1     5    1    8  0
      item2     3  731  189  9
      


      关于groupby

      因为我们不需要任何汇总.如果要使用groupby,则可以通过使用影响较小的聚合器来最大程度地减少隐式聚合的影响.

      Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.

      df.groupby(['grouping', 'labels'])['count'].max().unstack()
      

      df.groupby(['grouping', 'labels'])['count'].first().unstack()
      

      定时groupby

      这篇关于将pandas DataFrame设置为正确的格式:DataError:没有要聚合的数字类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆