pandas 无法通过聚合功能列表进行聚合 [英] Pandas fails to aggregate with a list of aggregation functions

查看:100
本文介绍了 pandas 无法通过聚合功能列表进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何指定自定义聚合函数,以便在pandas.DataFrame.aggregate的列表参数中使用时正确运行?

How do I specify custom aggregating functions so that they behave correctly when used in list arguments of pandas.DataFrame.aggregate?

给出熊猫的两列数据框...

Given a two-column dataframe in pandas ...

import pandas as pd
import numpy as np
df = pd.DataFrame(index=range(10))
df['a'] = [ 3 * x for x in range(10) ]
df['b'] = [ 1 -2 * x for x in range(10) ]

...通过聚合函数规范列表进行聚合不是问题:

... aggregating over a list of aggregation function specs is not a problem:

def ok_mean(x):
  return x.mean()

df.aggregate(['mean', np.max, ok_mean])

               a    b
mean        13.5    -8.0
amax        27.0    1.0
ok_mean     13.5    -8.0

但是当将聚合指定为(lambda或命名的)函数时,将无法聚合:

but when an aggregation is specified as a (lambda or named) function, this fails to aggregate:

def nok_mean(x):
  return np.mean(x)

df.aggregate([lambda x:  np.mean(x), nok_mean])

                   a                 b
   <lambda> nok_mean <lambda> nok_mean
0   0.0      0.0     1.0     1.0
1   3.0      3.0    -1.0    -1.0
2   6.0      6.0    -3.0    -3.0
3   9.0      9.0    -5.0    -5.0
4   12.0    12.0    -7.0    -7.0
...

混合汇总和非汇总规格会导致错误:

Mixing aggregating and non-aggregating specs lead to errors:

df.aggregate(['mean', nok_mean])

~/anaconda3/envs/tsa37_jup/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
    607         # if we are empty
    608         if not len(results):
--> 609             raise ValueError("no results")
    610 

虽然直接使用聚合功能(不在列表中)会产生预期的结果:

While using the aggregating function directly (not in list) gives the expected result:

df.aggregate(nok_mean)

a    13.5
b    -8.0
dtype: float64

这是一个错误还是我缺少定义聚合函数的方式?在我的真实项目中,我使用的是更复杂的聚合函数(例如此百分位数).所以我的问题是:

Is this a bug or am I missing something in the way that I define aggregation functions? In my real project, i'm using more complex aggregation functions (such as a this percentile one). So my question is:

我如何指定自定义聚合功能以解决此错误?

How do I specify custom aggregating function in order to workaround this bug?

请注意,在滚动,扩展或分组窗口上使用自定义聚合功能会得到预期的结果:

Note that using the custom aggregating function over a rolling, expanding or group-by window gives the expected result:

df.expanding().aggregate(['mean', nok_mean])
## returns cumulative aggregation results as expected

熊猫版本:0.23.4

Pandas version: 0.23.4

推荐答案

我发现,使用非系列参数调用聚合函数时会失败:

I found that making the aggregating function fail when called with a non-Series arguments is a work-around:

def ok_mean(x):
  return np.mean(x.values)

def ok_mean2(x):
  if not isinstance(x,pd.Series):
    raise ValueError('need Series argument')
  return np.mean(x)

df.aggregate(['mean', ok_mean, ok_mean2])

似乎在这种情况下(在pandas.DataFrame.aggregate的list参数中),pandas首先尝试将聚合函数应用于每个数据点,并且从失败的那一刻起,便退回到正确的行为(使用Series进行回调)汇总).

Seems that in this circumstance (in list argument to pandas.DataFrame.aggregate), pandas first tries to apply the aggregating function to each data point, and from the moment this fails, falls back to the correct behaviour (calling back with the Series to be aggregated).

使用装饰器强制执行Series参数:

Using a decorator to force Series arguments:

def assert_argtype(clazz):
    def wrapping(f):
        def wrapper(s):
            if not isinstance(s,clazz):
                raise ValueError('needs %s argument' % clazz)
            return f(s)
        return wrapper
    return wrapping

@assert_argtype(pd.Series)
def nok_mean(x):
    return np.mean(x)

df.aggregate([nok_mean])
## OK now, decorator fixed it!

这篇关于 pandas 无法通过聚合功能列表进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆