在数据框上滚动一个函数 [英] Rolling a function on a data frame

查看:121
本文介绍了在数据框上滚动一个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框架 C

>>> C
              a    b   c
2011-01-01    0    0 NaN
2011-01-02   41   12 NaN
2011-01-03   82   24 NaN
2011-01-04  123   36 NaN
2011-01-05  164   48 NaN
2011-01-06  205   60   2
2011-01-07  246   72   4
2011-01-08  287   84   6
2011-01-09  328   96   8
2011-01-10  369  108  10

我想在固定窗口(6这里)添加一个新列, d ,我在哪里应用滚动功能,我不知何故行(或日期),修复 c 。这个滚动函数中的一个循环应该是(伪):

I would like to add a new column, d, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value c. One loop in this rolling function should be (pseudo):

              a    b   c   d
2011-01-01    0    0 NaN   a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02   41   12 NaN   a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03   82   24 NaN   a + b*2
2011-01-04  123   36 NaN   a + b*2
2011-01-05  164   48 NaN   a + b*2
2011-01-06  205   60   2   a + b*2
2011-01-07  246   72   4   
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

在这个循环之后,我想在 d 中获取所有这六个计算的行,并运行一个函数调用,而这又返回一个值应该存储在另一列中, e 说:

After this "loop" I want to take all of these 6 calculated rows in d and run a function call, which in turn will return one value, that should be stored in another column, e say:

              a    b   c   d                               e

2011-01-01    0    0 NaN   a + b*2 ---|                   NaN
2011-01-02   41   12 NaN   a + b*2    |                   NaN
2011-01-03   82   24 NaN   a + b*2    | These values      NaN
2011-01-04  123   36 NaN   a + b*2    | are input to      NaN
2011-01-05  164   48 NaN   a + b*2    | function          NaN
2011-01-06  205   60   2   a + b*2 ---| yielding          X
2011-01-07  246   72   4                value X in
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

然后将该过程迭代到下一个窗口(再6长)喜欢:

This procedure would then be iterated onto the next window (again 6 long) like:

              a    b   c   d             e
2011-01-01    0    0 NaN   
2011-01-02   41   12 NaN   a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07)
2011-01-03   82   24 NaN   a + b*4 (a,b from this row, '4' is still from 2011-01-07)
2011-01-04  123   36 NaN   a + b*4
2011-01-05  164   48 NaN   a + b*4
2011-01-06  205   60   2   a + b*4       X
2011-01-07  246   72   4   a + b*4
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

              a    b   c   d                               e

2011-01-01    0    0 NaN                                  NaN
2011-01-02   41   12 NaN   a + b*4 ---|                   NaN
2011-01-03   82   24 NaN   a + b*4    | These values      NaN
2011-01-04  123   36 NaN   a + b*4    | are input to      NaN
2011-01-05  164   48 NaN   a + b*4    | function          NaN
2011-01-06  205   60   2   a + b*4    | yielding          X
2011-01-07  246   72   4   a + b*4 ---| value Y in        Y
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

希望这个很清楚,

谢谢,
N

Thanks, N

推荐答案

您可以使用 pd.rolling_apply

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')

def foo(x, df):
    window = df.iloc[x]
    # print(window)
    c = df.ix[int(x[-1]), 'c']
    dvals = window['a'] + window['b']*c
    return bar(dvals)

def bar(dvals):
    # print(dvals)
    return dvals.mean()

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))
print(df)

产生

              a    b   c       e
2011-01-01    0    0 NaN     NaN
2011-01-02   41   12 NaN     NaN
2011-01-03   82   24 NaN     NaN
2011-01-04  123   36 NaN     NaN
2011-01-05  164   48 NaN     NaN
2011-01-06  205   60   2   162.5
2011-01-07  246   72   4   311.5
2011-01-08  287   84   6   508.5
2011-01-09  328   96   8   753.5
2011-01-10  369  108  10  1046.5






code> args 和 kwargs 参数添加到 rolling_apply

由于在我上面的例子中, df 是一个全局变量,它不是真的必要的
将其传递给 foo 作为论据。您可以从 def
foo
行中删除 df ,并省略 rolling_apply 的调用中,args =(df,)

Since in my example above df is a global variable, it is not really necessary to pass it to foo as an argument. You could simply remove df from the def foo line and also omit the args=(df,) in the call to rolling_apply.

然而,在 df 可能没有在 foo 可访问的范围内定义的时候。在这种情况下,有一个简单的解决方法 - 关闭:

However, there are times when df might not be defined in a scope accessible by foo. In that case, there is a simple workaround -- make a closure:

def foo(df):
    def inner_foo(x):
        window = df.iloc[x]
        # print(window)
        c = df.ix[int(x[-1]), 'c']
        dvals = window['a'] + window['b']*c
        return bar(dvals)
    return inner_foo

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

这篇关于在数据框上滚动一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆