在数据框上滚动函数 [英] Rolling a function on a data frame
问题描述
我有以下数据框C
.
我想添加一个新列,d
,我在固定窗口(此处为 6)上应用滚动函数,我不知何故,对于每一行(或日期),修复值c
.这个滚动函数中的一个循环应该是(伪):
a b c d2011-01-01 0 0 NaN a + b*2(a,b 来自这一行,'2' 来自 2011-01-06 的'c')2011-01-02 41 12 NaN a + b*2(a,b 来自这一行,'2' 仍然来自 2011-01-06)2011-01-03 82 24 NaN a + b*22011-01-04 123 36 NaN a + b*22011-01-05 164 48 NaN a + b*22011-01-06 205 60 2 a + b*22011-01-07 246 72 42011-01-08 287 84 62011-01-09 328 96 82011-01-10 369 108 10
在这个循环"之后,我想在 d
中获取所有这 6 个计算行并运行一个函数调用,这反过来将返回 one 值,这应该存储在另一列中,e
说:
a b c d e2011-01-01 0 0 NaN a + b*2 ---|NaN2011-01-02 41 12 NaN a + b*2 |NaN2011-01-03 82 24 NaN a + b*2 |这些值 NaN2011-01-04 123 36 NaN a + b*2 |输入到 NaN2011-01-05 164 48 NaN a + b*2 |函数 NaN2011-01-06 205 60 2 a + b*2 ---|产生 X2011-01-07 246 72 4 值 X in2011-01-08 287 84 6 列e"2011-01-09 328 96 82011-01-10 369 108 10
这个过程将被迭代到下一个窗口(同样是 6 长),如:
a b c d e2011-01-01 0 0 NaN2011-01-02 41 12 NaN a + b*4(a,b 来自这一行,'4' 来自 2011-01-07 现在的'c')2011-01-03 82 24 NaN a + b*4(a,b 来自这一行,'4' 仍然来自 2011-01-07)2011-01-04 123 36 NaN a + b*42011-01-05 164 48 NaN a + b*42011-01-06 205 60 2 a + b*4 X2011-01-07 246 72 4 a + b*42011-01-08 287 84 62011-01-09 328 96 82011-01-10 369 108 10a b c d2011-01-01 0 0 NaN NaN2011-01-02 41 12 NaN a + b*4 ---|NaN2011-01-03 82 24 NaN a + b*4 |这些值 NaN2011-01-04 123 36 NaN a + b*4 |输入到 NaN2011-01-05 164 48 NaN a + b*4 |函数 NaN2011-01-06 205 60 2 a + b*4 |产生 X2011-01-07 246 72 4 a + b*4 ---|Y 中的 Y 值2011-01-08 287 84 6 列e"2011-01-09 328 96 82011-01-10 369 108 10
希望这足够清楚,
谢谢,否
你可以使用 pd.rolling_apply
:
将 numpy 导入为 np将熊猫导入为 pddf = pd.read_table('数据', sep='\s+')def foo(x, df):窗口 = df.iloc[x]# 打印(窗口)c = df.ix[int(x[-1]), 'c']dvals = 窗口['a'] + 窗口['b']*c返回栏(dvals)定义栏(dvals):# 打印(dvals)返回 dvals.mean()df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))打印(df)
收益
a b c e2011-01-01 0 0 NaN NaN2011-01-02 41 12 NaN NaN2011-01-03 82 24 NaN NaN2011-01-04 123 36 NaN NaN2011-01-05 164 48 NaN NaN2011-01-06 205 60 2 162.52011-01-07 246 72 4 311.52011-01-08 287 84 6 508.52011-01-09 328 96 8 753.52011-01-10 369 108 10 1046.5
<小时>
args
和 kwargs
参数是 在 Pandas 版本 0.14.0 中添加到 rolling_apply
.
因为在我上面的例子中 df
是一个全局变量,它不是真的必要将其作为参数传递给 foo
.您可以简单地从 def 中删除
行,并在对 df
foorolling_apply
的调用中省略 args=(df,)
.
然而,有时df
可能未定义在foo
可访问的范围内.在这种情况下,有一个简单的解决方法 - 关闭:
def foo(df):definner_foo(x):窗口 = df.iloc[x]# 打印(窗口)c = df.ix[int(x[-1]), 'c']dvals = 窗口['a'] + 窗口['b']*c返回栏(dvals)返回inner_foodf['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))
I have the following data frame C
.
>>> C
a b c
2011-01-01 0 0 NaN
2011-01-02 41 12 NaN
2011-01-03 82 24 NaN
2011-01-04 123 36 NaN
2011-01-05 164 48 NaN
2011-01-06 205 60 2
2011-01-07 246 72 4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10
I would like to add a new column, d
, where I apply a rolling function, on a fixed window (6 here), where I somehow, for each row (or date), fix the value c
. One loop in this rolling function should be (pseudo):
a b c d
2011-01-01 0 0 NaN a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02 41 12 NaN a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03 82 24 NaN a + b*2
2011-01-04 123 36 NaN a + b*2
2011-01-05 164 48 NaN a + b*2
2011-01-06 205 60 2 a + b*2
2011-01-07 246 72 4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10
After this "loop" I want to take all of these 6 calculated rows in d
and run a function call, which in turn will return one value, that should be stored in another column, e
say:
a b c d e
2011-01-01 0 0 NaN a + b*2 ---| NaN
2011-01-02 41 12 NaN a + b*2 | NaN
2011-01-03 82 24 NaN a + b*2 | These values NaN
2011-01-04 123 36 NaN a + b*2 | are input to NaN
2011-01-05 164 48 NaN a + b*2 | function NaN
2011-01-06 205 60 2 a + b*2 ---| yielding X
2011-01-07 246 72 4 value X in
2011-01-08 287 84 6 column 'e'
2011-01-09 328 96 8
2011-01-10 369 108 10
This procedure would then be iterated onto the next window (again 6 long) like:
a b c d e
2011-01-01 0 0 NaN
2011-01-02 41 12 NaN a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07)
2011-01-03 82 24 NaN a + b*4 (a,b from this row, '4' is still from 2011-01-07)
2011-01-04 123 36 NaN a + b*4
2011-01-05 164 48 NaN a + b*4
2011-01-06 205 60 2 a + b*4 X
2011-01-07 246 72 4 a + b*4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10
a b c d e
2011-01-01 0 0 NaN NaN
2011-01-02 41 12 NaN a + b*4 ---| NaN
2011-01-03 82 24 NaN a + b*4 | These values NaN
2011-01-04 123 36 NaN a + b*4 | are input to NaN
2011-01-05 164 48 NaN a + b*4 | function NaN
2011-01-06 205 60 2 a + b*4 | yielding X
2011-01-07 246 72 4 a + b*4 ---| value Y in Y
2011-01-08 287 84 6 column 'e'
2011-01-09 328 96 8
2011-01-10 369 108 10
Hopefully this is clear enough,
Thanks, N
You could use pd.rolling_apply
:
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')
def foo(x, df):
window = df.iloc[x]
# print(window)
c = df.ix[int(x[-1]), 'c']
dvals = window['a'] + window['b']*c
return bar(dvals)
def bar(dvals):
# print(dvals)
return dvals.mean()
df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))
print(df)
yields
a b c e
2011-01-01 0 0 NaN NaN
2011-01-02 41 12 NaN NaN
2011-01-03 82 24 NaN NaN
2011-01-04 123 36 NaN NaN
2011-01-05 164 48 NaN NaN
2011-01-06 205 60 2 162.5
2011-01-07 246 72 4 311.5
2011-01-08 287 84 6 508.5
2011-01-09 328 96 8 753.5
2011-01-10 369 108 10 1046.5
The args
and kwargs
parameters were added to rolling_apply
in Pandas version 0.14.0.
Since in my example above df
is a global variable, it is not really necessary
to pass it to foo
as an argument. You could simply remove df
from the def
foo
line and also omit the args=(df,)
in the call to rolling_apply
.
However, there are times when df
might not be defined in a scope accessible by foo
. In that case, there is a simple workaround -- make a closure:
def foo(df):
def inner_foo(x):
window = df.iloc[x]
# print(window)
c = df.ix[int(x[-1]), 'c']
dvals = window['a'] + window['b']*c
return bar(dvals)
return inner_foo
df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))
这篇关于在数据框上滚动函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!