在数据框上滚动一个函数 [英] Rolling a function on a data frame

查看：121 发布时间：2017/3/26 0:16:24 python pandas dataframe apply

本文介绍了在数据框上滚动一个函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框架 C 。

>>> C
              a    b   c
2011-01-01    0    0 NaN
2011-01-02   41   12 NaN
2011-01-03   82   24 NaN
2011-01-04  123   36 NaN
2011-01-05  164   48 NaN
2011-01-06  205   60   2
2011-01-07  246   72   4
2011-01-08  287   84   6
2011-01-09  328   96   8
2011-01-10  369  108  10

我想在固定窗口（6这里）添加一个新列， d ，我在哪里应用滚动功能，我不知何故行（或日期），修复值 c 。这个滚动函数中的一个循环应该是（伪）：

I would like to add a new column, d, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value c. One loop in this rolling function should be (pseudo):

              a    b   c   d
2011-01-01    0    0 NaN   a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02   41   12 NaN   a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03   82   24 NaN   a + b*2
2011-01-04  123   36 NaN   a + b*2
2011-01-05  164   48 NaN   a + b*2
2011-01-06  205   60   2   a + b*2
2011-01-07  246   72   4   
2011-01-08  287   84   6   
2011-01-09  328   96   8   
2011-01-10  369  108  10

在这个循环之后，我想在 d 中获取所有这六个计算的行，并运行一个函数调用，而这又返回一个值应该存储在另一列中， e 说：

After this "loop" I want to take all of these 6 calculated rows in d and run a function call, which in turn will return one value, that should be stored in another column, e say:

              a    b   c   d                               e

2011-01-01    0    0 NaN   a + b*2 ---|                   NaN
2011-01-02   41   12 NaN   a + b*2    |                   NaN
2011-01-03   82   24 NaN   a + b*2    | These values      NaN
2011-01-04  123   36 NaN   a + b*2    | are input to      NaN
2011-01-05  164   48 NaN   a + b*2    | function          NaN
2011-01-06  205   60   2   a + b*2 ---| yielding          X
2011-01-07  246   72   4                value X in
2011-01-08  287   84   6                column 'e'
2011-01-09  328   96   8   
2011-01-10  369  108  10

然后将该过程迭代到下一个窗口（再6长）喜欢：

This procedure would then be iterated onto the next window (again 6 long) like:

a b c d e 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07) 2011-01-03 82 24 NaN a + b*4 (a,b from this row, '4' is still from 2011-01-07) 2011-01-04 123 36 NaN a + b*4 2011-01-05 164 48 NaN a + b*4 2011-01-06 205 60 2 a + b*4 X 2011-01-07 246 72 4 a + b*4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10 a b c d e 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN a + b*4 ---| NaN 2011-01-03 82 24 NaN a + b*4 | These values NaN 2011-01-04 123 36 NaN a + b*4 | are input to NaN 2011-01-05 164 48 NaN a + b*4 | function NaN 2011-01-06 205 60 2 a + b*4 | yielding X 2011-01-07 246 72 4 a + b*4 ---| value Y in Y 2011-01-08 287 84 6 column 'e' 2011-01-09 328 96 8 2011-01-10 369 108 10

希望这个很清楚，

谢谢，
N

Thanks, N

推荐答案

您可以使用 pd.rolling_apply ：

import numpy as np import pandas as pd df = pd.read_table('data', sep='\s+') def foo(x, df): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) def bar(dvals): # print(dvals) return dvals.mean() df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,)) print(df)

产生

a b c e 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN NaN 2011-01-03 82 24 NaN NaN 2011-01-04 123 36 NaN NaN 2011-01-05 164 48 NaN NaN 2011-01-06 205 60 2 162.5 2011-01-07 246 72 4 311.5 2011-01-08 287 84 6 508.5 2011-01-09 328 96 8 753.5 2011-01-10 369 108 10 1046.5

code> args 和 kwargs 参数添加到 rolling_apply

由于在我上面的例子中， df 是一个全局变量，它不是真的必要的
将其传递给 foo 作为论据。您可以从def foo行中删除 df ，并省略在 rolling_apply 的调用中，args =（df，）。

Since in my example above df is a global variable, it is not really necessary to pass it to foo as an argument. You could simply remove df from the def foo line and also omit the args=(df,) in the call to rolling_apply.

然而，在 df 可能没有在 foo 可访问的范围内定义的时候。在这种情况下，有一个简单的解决方法 - 关闭：

However, there are times when df might not be defined in a scope accessible by foo. In that case, there is a simple workaround -- make a closure:

def foo(df): def inner_foo(x): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) return inner_foo df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

这篇关于在数据框上滚动一个函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在数据框上滚动一个函数 [英] Rolling a function on a data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在数据框上滚动一个函数 [英] Rolling a function on a data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭