在数据框上滚动函数 [英] Rolling a function on a data frame

查看:21
本文介绍了在数据框上滚动函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框C.

<预><代码>>>>Ca b c2011-01-01 0 0 NaN2011-01-02 41 12 NaN2011-01-03 82 24 NaN2011-01-04 123 36 NaN2011-01-05 164 48 NaN2011-01-06 205 60 22011-01-07 246 72 42011-01-08 287 84 62011-01-09 328 96 82011-01-10 369 108 10

我想添加一个新列,d,我在固定窗口(此处为 6)上应用滚动函数,我不知何故,对于每一行(或日期),修复c.这个滚动函数中的一个循环应该是(伪):

 a b c d2011-01-01 0 0 NaN a + b*2(a,b 来自这一行,'2' 来自 2011-01-06 的'c')2011-01-02 41 12 NaN a + b*2(a,b 来自这一行,'2' 仍然来自 2011-01-06)2011-01-03 82 24 NaN a + b*22011-01-04 123 36 NaN a + b*22011-01-05 164 48 NaN a + b*22011-01-06 205 60 2 a + b*22011-01-07 246 72 42011-01-08 287 84 62011-01-09 328 96 82011-01-10 369 108 10

在这个循环"之后,我想在 d 中获取所有这 6 个计算行并运行一个函数调用,这反过来将返回 one 值,这应该存储在另一列中,e 说:

 a b c d e2011-01-01 0 0 NaN a + b*2 ---|NaN2011-01-02 41 12 NaN a + b*2 |NaN2011-01-03 82 24 NaN a + b*2 |这些值 NaN2011-01-04 123 36 NaN a + b*2 |输入到 NaN2011-01-05 164 48 NaN a + b*2 |函数 NaN2011-01-06 205 60 2 a + b*2 ---|产生 X2011-01-07 246 72 4 值 X in2011-01-08 287 84 6 列e"2011-01-09 328 96 82011-01-10 369 108 10

这个过程将被迭代到下一个窗口(同样是 6 长),如:

 a b c d e2011-01-01 0 0 NaN2011-01-02 41 12 NaN a + b*4(a,b 来自这一行,'4' 来自 2011-01-07 现在的'c')2011-01-03 82 24 NaN a + b*4(a,b 来自这一行,'4' 仍然来自 2011-01-07)2011-01-04 123 36 NaN a + b*42011-01-05 164 48 NaN a + b*42011-01-06 205 60 2 a + b*4 X2011-01-07 246 72 4 a + b*42011-01-08 287 84 62011-01-09 328 96 82011-01-10 369 108 10a b c d2011-01-01 0 0 NaN NaN2011-01-02 41 12 NaN a + b*4 ---|NaN2011-01-03 82 24 NaN a + b*4 |这些值 NaN2011-01-04 123 36 NaN a + b*4 |输入到 NaN2011-01-05 164 48 NaN a + b*4 |函数 NaN2011-01-06 205 60 2 a + b*4 |产生 X2011-01-07 246 72 4 a + b*4 ---|Y 中的 Y 值2011-01-08 287 84 6 列e"2011-01-09 328 96 82011-01-10 369 108 10

希望这足够清楚,

谢谢,否

解决方案

你可以使用 pd.rolling_apply:

将 numpy 导入为 np将熊猫导入为 pddf = pd.read_table('数据', sep='\s+')def foo(x, df):窗口 = df.iloc[x]# 打印(窗口)c = df.ix[int(x[-1]), 'c']dvals = 窗口['a'] + 窗口['b']*c返回栏(dvals)定义栏(dvals):# 打印(dvals)返回 dvals.mean()df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))打印(df)

收益

 a b c e2011-01-01 0 0 NaN NaN2011-01-02 41 12 NaN NaN2011-01-03 82 24 NaN NaN2011-01-04 123 36 NaN NaN2011-01-05 164 48 NaN NaN2011-01-06 205 60 2 162.52011-01-07 246 72 4 311.52011-01-08 287 84 6 508.52011-01-09 328 96 8 753.52011-01-10 369 108 10 1046.5

<小时>

argskwargs 参数是 在 Pandas 版本 0.14.0 中添加到 rolling_apply.

因为在我上面的例子中 df 是一个全局变量,它不是真的必要将其作为参数传递给 foo.您可以简单地从 def 中删除 dffoo 行,并在对 rolling_apply 的调用中省略 args=(df,).

然而,有时df 可能未定义在foo 可访问的范围内.在这种情况下,有一个简单的解决方法 - 关闭:

def foo(df):definner_foo(x):窗口 = df.iloc[x]# 打印(窗口)c = df.ix[int(x[-1]), 'c']dvals = 窗口['a'] + 窗口['b']*c返回栏(dvals)返回inner_foodf['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

I have the following data frame C.

>>> C
              a    b   c
2011-01-01    0    0 NaN
2011-01-02   41   12 NaN
2011-01-03   82   24 NaN
2011-01-04  123   36 NaN
2011-01-05  164   48 NaN
2011-01-06  205   60   2
2011-01-07  246   72   4
2011-01-08  287   84   6
2011-01-09  328   96   8
2011-01-10  369  108  10

I would like to add a new column, d, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value c. One loop in this rolling function should be (pseudo):

              a    b   c   d
2011-01-01    0    0 NaN   a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02   41   12 NaN   a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03   82   24 NaN   a + b*2
2011-01-04  123   36 NaN   a + b*2
2011-01-05  164   48 NaN   a + b*2
2011-01-06  205   60   2   a + b*2
2011-01-07  246   72   4   
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

After this "loop" I want to take all of these 6 calculated rows in d and run a function call, which in turn will return one value, that should be stored in another column, e say:

              a    b   c   d                               e

2011-01-01    0    0 NaN   a + b*2 ---|                   NaN
2011-01-02   41   12 NaN   a + b*2    |                   NaN
2011-01-03   82   24 NaN   a + b*2    | These values      NaN
2011-01-04  123   36 NaN   a + b*2    | are input to      NaN
2011-01-05  164   48 NaN   a + b*2    | function          NaN
2011-01-06  205   60   2   a + b*2 ---| yielding          X
2011-01-07  246   72   4                value X in
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

This procedure would then be iterated onto the next window (again 6 long) like:

              a    b   c   d             e
2011-01-01    0    0 NaN   
2011-01-02   41   12 NaN   a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07)
2011-01-03   82   24 NaN   a + b*4 (a,b from this row, '4' is still from 2011-01-07)
2011-01-04  123   36 NaN   a + b*4
2011-01-05  164   48 NaN   a + b*4
2011-01-06  205   60   2   a + b*4       X
2011-01-07  246   72   4   a + b*4
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

              a    b   c   d                               e

2011-01-01    0    0 NaN                                  NaN
2011-01-02   41   12 NaN   a + b*4 ---|                   NaN
2011-01-03   82   24 NaN   a + b*4    | These values      NaN
2011-01-04  123   36 NaN   a + b*4    | are input to      NaN
2011-01-05  164   48 NaN   a + b*4    | function          NaN
2011-01-06  205   60   2   a + b*4    | yielding          X
2011-01-07  246   72   4   a + b*4 ---| value Y in        Y
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

Hopefully this is clear enough,

Thanks, N

解决方案

You could use pd.rolling_apply:

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')

def foo(x, df):
    window = df.iloc[x]
    # print(window)
    c = df.ix[int(x[-1]), 'c']
    dvals = window['a'] + window['b']*c
    return bar(dvals)

def bar(dvals):
    # print(dvals)
    return dvals.mean()

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))
print(df)

yields

              a    b   c       e
2011-01-01    0    0 NaN     NaN
2011-01-02   41   12 NaN     NaN
2011-01-03   82   24 NaN     NaN
2011-01-04  123   36 NaN     NaN
2011-01-05  164   48 NaN     NaN
2011-01-06  205   60   2   162.5
2011-01-07  246   72   4   311.5
2011-01-08  287   84   6   508.5
2011-01-09  328   96   8   753.5
2011-01-10  369  108  10  1046.5


The args and kwargs parameters were added to rolling_apply in Pandas version 0.14.0.

Since in my example above df is a global variable, it is not really necessary to pass it to foo as an argument. You could simply remove df from the def foo line and also omit the args=(df,) in the call to rolling_apply.

However, there are times when df might not be defined in a scope accessible by foo. In that case, there is a simple workaround -- make a closure:

def foo(df):
    def inner_foo(x):
        window = df.iloc[x]
        # print(window)
        c = df.ix[int(x[-1]), 'c']
        dvals = window['a'] + window['b']*c
        return bar(dvals)
    return inner_foo

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

这篇关于在数据框上滚动函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆