在 Dataframe 中的滚动窗口上应用一个函数,其中整个数据帧被传递给函数 [英] apply a function on rolling window in Dataframe where whole dataframe is passed to function

查看:28
本文介绍了在 Dataframe 中的滚动窗口上应用一个函数,其中整个数据帧被传递给函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由 YearMo 索引的 5 列数据框:

yearmo = np.repeat(np.arange(2000, 2010) * 100, 12) + [x for x in range(1,13)] * 10rate = pd.DataFrame(data=np.random.random(120, 5)),index=pd.Series(data=yearmo, name='YearMo'),列=['A', 'B','C', 'D', 'E'])费率.头()年Mo A B C D E200411 0.237696 0.341937 0.258713 0.569689 0.470776200412 0.601713 0.313006 0.221821 0.720162 0.889891200501 0.024379 0.761315 0.225032 0.293682 0.302431200502 0.996778 0.388783 0.026448 0.056188 0.744850200503 0.942024 0.768416 0.484236 0.102904 0.287446

我想要做的是能够应用滚动窗口并将所有五列传递给一个函数——例如:

rates.rolling(window=60, min_periods=60).apply(lambda x: my_func(data=x, param=5)

但是这种方法将函数应用于每一列.指定 axis=1 也没有任何作用....

解决方案

问题:...应用滚动窗口并将所有五列传递给函数

这会做你想做的,min_periods=5,axis=1..rolling(... window is column 'A':'E' 或 5 的倍数.

def f1(data=None):print('f1(%s, %s) data=%s' % (str(type(data)), param, data))返回 data.sum()subRates = rate.rolling(window=60, min_periods=5, axis=1).apply(lambda x: f1( x ) )

<块引用>

输入:

 A B C D E年末200001 0.666744 0.569194 0.546873 0.018696 0.240783200002 0.035888 0.853077 0.348200 0.921997 0.283177200003 0.652761 0.076630 0.298076 0.800504 0.041231200004 0.537397 0.968399 0.211072 0.328157 0.929783200005 0.759506 0.702220 0.807477 0.886935 0.022587

<块引用>

输出:

f1(, None) data=[ 0.66674393 0.56919434 0.54687296 0.01869609 0.24078329]f1(<class 'numpy.ndarray'>, None) 数据=[ 0.03588751 0.85307707 0.34819965 0.92199698 0.28317727]f1(<class 'numpy.ndarray'>, None) data=[ 0.65276067 0.07663029 0.29807589 0.80050448 0.04123137]f1(<class 'numpy.ndarray'>, None) data=[ 0.53739687 0.96839917 0.21107155 0.32815687 0.92978308]f1(<class 'numpy.ndarray'>, None) 数据=[ 0.75950632 0.70222034 0.80747698 0.88693524 0.02258685]A B C D E年末200001 NaN NaN NaN NaN 2.042291200002 NaN NaN NaN NaN 2.442338200003 NaN NaN NaN NaN 1.869203200004 NaN NaN NaN NaN 2.974808200005 NaN NaN NaN NaN 3.178726

使用 Python 测试:3.4.2 - pandas:0.19.2

I have a dataframe of 5 columns indexed by YearMo:

yearmo = np.repeat(np.arange(2000, 2010) * 100, 12) + [x for x in range(1,13)] * 10 
rates = pd.DataFrame(data=np.random.random(120, 5)), 
                     index=pd.Series(data=yearmo, name='YearMo'), 
                     columns=['A', 'B','C', 'D', 'E'])

rates.head()                       
YearMo    A         B          C         D       E 
200411  0.237696  0.341937  0.258713  0.569689  0.470776
200412  0.601713  0.313006  0.221821  0.720162  0.889891
200501  0.024379  0.761315  0.225032  0.293682  0.302431
200502  0.996778  0.388783  0.026448  0.056188  0.744850
200503  0.942024  0.768416  0.484236  0.102904  0.287446

What I would like to do is to be able to apply a rolling window and pass all five columns to a function – something like:

rates.rolling(window=60, min_periods=60).apply(lambda x: my_func(data=x, param=5)

but this approach applies the function to each column. Specifying axis=1 doesn't do anything either....

解决方案

Question: ... apply a rolling window and pass all five columns to a function

This will do what you want, min_periods=5, axis=1. .rolling(... window is column 'A':'E' or a multiple of 5.

def f1(data=None):
    print('f1(%s, %s) data=%s' % (str(type(data)), param, data))
    return data.sum()

subRates = rates.rolling(window=60, min_periods=5, axis=1).apply(lambda x: f1( x ) )

Input:

               A         B         C         D         E
YearMo
200001  0.666744  0.569194  0.546873  0.018696  0.240783
200002  0.035888  0.853077  0.348200  0.921997  0.283177
200003  0.652761  0.076630  0.298076  0.800504  0.041231
200004  0.537397  0.968399  0.211072  0.328157  0.929783
200005  0.759506  0.702220  0.807477  0.886935  0.022587

Output:

f1(<class 'numpy.ndarray'>, None) data=[ 0.66674393  0.56919434  0.54687296  0.01869609  0.24078329]
f1(<class 'numpy.ndarray'>, None) data=[ 0.03588751  0.85307707  0.34819965  0.92199698  0.28317727]
f1(<class 'numpy.ndarray'>, None) data=[ 0.65276067  0.07663029  0.29807589  0.80050448  0.04123137]
f1(<class 'numpy.ndarray'>, None) data=[ 0.53739687  0.96839917  0.21107155  0.32815687  0.92978308]
f1(<class 'numpy.ndarray'>, None) data=[ 0.75950632  0.70222034  0.80747698  0.88693524  0.02258685]
         A   B   C   D         E
YearMo
200001 NaN NaN NaN NaN  2.042291
200002 NaN NaN NaN NaN  2.442338
200003 NaN NaN NaN NaN  1.869203
200004 NaN NaN NaN NaN  2.974808
200005 NaN NaN NaN NaN  3.178726

Tested with Python:3.4.2 - pandas:0.19.2

这篇关于在 Dataframe 中的滚动窗口上应用一个函数,其中整个数据帧被传递给函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆