DataError:没有使用均值聚合函数但不求和的数字类型? [英] DataError: No numeric types using mean aggregate function but not sum?

查看:100
本文介绍了DataError:没有使用均值聚合函数但不求和的数字类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以使用agg()来解释以下行为

I was wondering if someone could help explain the below behaviour using agg()

import numpy as np
import pandas as pd
import string

初始化数据框

df = pd.DataFrame(data=[list(string.ascii_lowercase)[0:5]*2,list(range(1,11)),list(range(11,21))]).T
df.columns = columns=['g','c1','c2']

df.sort_values(['g']).head(5)

g   c1  c2
0   a   1   11
5   a   6   16
1   b   2   12
6   b   7   17
2   c   3   13

作为示例,我在对c1和c2进行求和平均时,按g进行分组

f = { 'c1' : lambda g: df.loc[g.index].c2.sum() + g.sum(), 'c2' : lambda g: (df.loc[g.index].c1.sum() + g.sum())/(g.count()+df.loc[g.index].c1.count())} 
df = df.groupby('g',as_index=False).agg(f)

数据类型错误:

rnm_cols = dict(sum='Sum', mean='Mean') #, std='Std')
df = df.set_index(['g']).stack().groupby('g').agg(rnm_cols.keys()).rename(columns=rnm_cols)

我得到-> DataError:没有要聚合的数字类型

我知道如果使用以下方法初始化数据框,则可以避免此问题:

I know if I initialise my data frame using the below I can avoid this issue:

df[['c1','c2']] = df[['c1','c2']].apply(lambda x: pd.to_numeric(x, errors='coerce'))

但是我试图理解为什么用均值聚合 函数提供了此类错误?

However I'm trying to understand why aggregating with the mean function provides such errors ?

推荐答案

这是由于GroupBy对象处理不同聚合方法的方式引起的.实际上,summean的处理方式有所不同(有关更多详细信息,请参见下文).

This is due to the way GroupBy objects handle the different aggregation methods. In fact sum and mean are handled differently (see below for more details).

但是最重要的是,mean仅适用于数据框中不存在的数字类型:

But the bottom line is that mean only works for numeric types which are not present in your data frame:

>>> df.dtypes
g     object
c1    object
c2    object
dtype: object

通过应用pd.to_numeric可以将它们转换为数字类型,并且agg可以使用.

By applying pd.to_numeric you convert them to numeric type and the agg works.

但让我们仔细看看:

此函数调用调度到 self._cython_agg_general 检查数字类型,如果没有找到任何数字类型(您的示例就是这种情况),则会引发

This function call dispatches to self._cython_agg_general which checks for numeric types and in case it doesn't find any (which is the case for your example) it raises a DataError. Though the call to self._cython_agg_general is wrapped in try/except in case of a GroupByError it just re-raises and DataError inherits from GroupByError. Thus the exception.

此函数的定义方式不同,即包装函数类似地进行调度到self._cython_agg_general中,并包装在try/except中,但是它没有为GroupByError添加特定子句(不知道为什么;也许对开发人员来说是个好问题,所以他们可以统一GroupBy的行为对象).因为self._cython_agg_general再次引发DataError,它将进入 except Exception 子句,它回退到

This function is defined in a different way, namely here (via this function). The wrapper function similarly dispatches to self._cython_agg_general, wrapped in try/except, but it doesn't add a specific clause for GroupByErrors (no idea why though; maybe that's a good question for the developers, so they can unify the behavior of GroupBy objects). Because self._cython_agg_general again raises the DataError it will enter the except Exception clause for which it falls back to self.aggregate. From here you can trace it down through a dozen of additional function calls but in the end it will simply add the single items of the series (which are stored as objects but adding in Python is no problem since they are ints in fact).

因此,这全部归结为两个聚合函数处理异常的不同方式. meanDataError上重新加注,但sum没有.对我来说,为什么"仍然是一个悬而未决的问题.

So it all comes down to the different ways exceptions are handled by the two aggregation functions; mean re-raises on DataError but sum doesn't. The "why" still remains an open question to me as well.

  • Inconsistencies in groupby aggregation with non-numeric types
  • SeriesGroupby.cumsum raises on object dtype

这篇关于DataError:没有使用均值聚合函数但不求和的数字类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆