Pandas Groupby和具有自定义功能的应用方法 [英] Pandas Groupby and apply method with custom function

查看:171
本文介绍了Pandas Groupby和具有自定义功能的应用方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我构建了以下函数,目的是估计熊猫的DataFrame列的最佳指数移动平均值.

I built the following function with the aim of estimating an optimal exponential moving average of a pandas' DataFrame column.

from scipy import optimize
from sklearn.metrics import mean_squared_error
import pandas as pd
## Function that finds best alpha and uses it to create ewma
def find_best_ewma(series, eps=10e-5):

    def f(alpha):
        ewm = series.shift().ewm(alpha=alpha, adjust=False).mean()
        return mean_squared_error(series, ewm.fillna(0))

    result = optimize.minimize(f,.3, bounds=[(0+eps, 1-eps)])

    return series.shift().ewm(alpha=result.x, adjust=False).mean()

现在,我想将此功能应用于在以下测试df上使用pandas-groupby创建的每个组:

Now I want to apply this function to each of the groups created using pandas-groupby on the following test df:

## test
      data1     data2 key1 key2
0 -0.018442 -1.564270    a    x
1 -0.038490 -1.504290    b    x
2  0.953920 -0.283246    a    x
3 -0.231322 -0.223326    b    y
4 -0.741380  1.458798    c    z
5 -0.856434  0.443335    d    y
6 -1.416564  1.196244    c    z

为此,我尝试了以下两种方法:

To do so, I tried the following two ways:

## First way
test.groupby(["key1","key2"])["data1"].apply(find_best_ewma)
## Output
0         NaN
1         NaN
2   -0.018442
3         NaN
4         NaN
5         NaN
6   -0.741380
Name: data1, dtype: float64

## Second way
test.groupby(["key1","key2"]).apply(lambda g: find_best_ewma(g["data1"]))
## Output
key1  key2   
a     x     0         NaN
            2   -0.018442
b     x     1         NaN
      y     3         NaN
c     z     4         NaN
            6   -0.741380
d     y     5         NaN
Name: data1, dtype: float64

两种方法都可以生成pandas.core.series.Series,但只有第二种方法可以提供预期的层次结构索引.

Both ways produce a pandas.core.series.Series but ONLY the second way provides the expected hierarchical index.

我不明白为什么第一种方法不产生分层索引,而是返回原始数据帧索引.您能解释一下为什么会这样吗?

I do not understand why the first way does not produce the hierarchical index and instead returns the original dataframe index. Could you please explain me why this happens?

我想念什么?

预先感谢您的帮助.

推荐答案

第一种方法创建pandas.core.groupby.DataFrameGroupBy对象,一旦从其中选择了特定的列,该对象即成为pandas.core.groupby.SeriesGroupBy对象. ;对此对象应用了'apply'方法,因此返回了一个序列.

The first way creates a pandas.core.groupby.DataFrameGroupBy object, which becomes a pandas.core.groupby.SeriesGroupBy object once you select a specific column from it; It is to this object that the 'apply' method is applied to, hence a series is returned.

test.groupby(["key1","key2"])["data1"]#.apply(find_best_ewma)
<pandas.core.groupby.SeriesGroupBy object at 0x7fce51fac790>

第二种方法保留 DataFrameGroupBy对象.您应用于该对象的函数将选择该列,这意味着该函数'find_best_ewma'将应用于该列的每个成员,但是'apply'方法将应用于原始DataFrameGroupBy,因此将一个DataFrame返回时,魔术"是因此DataFrame的索引仍然存在.

The second way remains a DataFrameGroupBy object. The function you apply to that object selects the column, which means the function 'find_best_ewma' is applied to each member of that column, but the 'apply' method is applied to the original DataFrameGroupBy, hence a DataFrame is returned, the 'magic' is that the indexes of the DataFrame are hence still present.

这篇关于Pandas Groupby和具有自定义功能的应用方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆