DataError:没有使用均值聚合函数但不求和的数字类型? [英] DataError: No numeric types using mean aggregate function but not sum?
问题描述
我想知道是否有人可以使用agg()来解释以下行为
I was wondering if someone could help explain the below behaviour using agg()
import numpy as np
import pandas as pd
import string
初始化数据框
df = pd.DataFrame(data=[list(string.ascii_lowercase)[0:5]*2,list(range(1,11)),list(range(11,21))]).T
df.columns = columns=['g','c1','c2']
df.sort_values(['g']).head(5)
g c1 c2
0 a 1 11
5 a 6 16
1 b 2 12
6 b 7 17
2 c 3 13
作为示例,我在对c1和c2进行求和平均时,按g进行分组
f = { 'c1' : lambda g: df.loc[g.index].c2.sum() + g.sum(), 'c2' : lambda g: (df.loc[g.index].c1.sum() + g.sum())/(g.count()+df.loc[g.index].c1.count())}
df = df.groupby('g',as_index=False).agg(f)
数据类型错误:
rnm_cols = dict(sum='Sum', mean='Mean') #, std='Std')
df = df.set_index(['g']).stack().groupby('g').agg(rnm_cols.keys()).rename(columns=rnm_cols)
我得到-> DataError:没有要聚合的数字类型
我知道如果使用以下方法初始化数据框,则可以避免此问题:
I know if I initialise my data frame using the below I can avoid this issue:
df[['c1','c2']] = df[['c1','c2']].apply(lambda x: pd.to_numeric(x, errors='coerce'))
但是我试图理解为什么用均值聚合 函数提供了此类错误?
However I'm trying to understand why aggregating with the mean function provides such errors ?
推荐答案
这是由于GroupBy
对象处理不同聚合方法的方式引起的.实际上,sum
和mean
的处理方式有所不同(有关更多详细信息,请参见下文).
This is due to the way GroupBy
objects handle the different aggregation methods. In fact sum
and mean
are handled differently (see below for more details).
但是最重要的是,mean
仅适用于数据框中不存在的数字类型:
But the bottom line is that mean
only works for numeric types which are not present in your data frame:
>>> df.dtypes
g object
c1 object
c2 object
dtype: object
通过应用pd.to_numeric
可以将它们转换为数字类型,并且agg
可以使用.
By applying pd.to_numeric
you convert them to numeric type and the agg
works.
但让我们仔细看看:
此函数调用调度到 self._cython_agg_general
检查数字类型,如果没有找到任何数字类型(您的示例就是这种情况),则会引发
This function call dispatches to self._cython_agg_general
which checks for numeric types and in case it doesn't find any (which is the case for your example) it raises a DataError
. Though the call to self._cython_agg_general
is wrapped in try/except
in case of a GroupByError
it just re-raises and DataError
inherits from GroupByError
. Thus the exception.
此函数的定义方式不同,即此功能). 包装函数类似地进行调度到self._cython_agg_general
中,并包装在try/except
中,但是它没有为GroupByError
添加特定子句(不知道为什么;也许对开发人员来说是个好问题,所以他们可以统一GroupBy
的行为对象).因为self._cython_agg_general
再次引发DataError
,它将进入 except Exception
子句,它回退到
This function is defined in a different way, namely here (via this function). The wrapper function similarly dispatches to self._cython_agg_general
, wrapped in try/except
, but it doesn't add a specific clause for GroupByError
s (no idea why though; maybe that's a good question for the developers, so they can unify the behavior of GroupBy
objects). Because self._cython_agg_general
again raises the DataError
it will enter the except Exception
clause for which it falls back to self.aggregate
. From here you can trace it down through a dozen of additional function calls but in the end it will simply add the single items of the series (which are stored as object
s but adding in Python is no problem since they are int
s in fact).
因此,这全部归结为两个聚合函数处理异常的不同方式. mean
在DataError
上重新加注,但sum
没有.对我来说,为什么"仍然是一个悬而未决的问题.
So it all comes down to the different ways exceptions are handled by the two aggregation functions; mean
re-raises on DataError
but sum
doesn't. The "why" still remains an open question to me as well.
- Inconsistencies in groupby aggregation with non-numeric types
- SeriesGroupby.cumsum raises on object dtype
这篇关于DataError:没有使用均值聚合函数但不求和的数字类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!