使用rolling_apply对pandas的Python自定义函数 [英] Python custom function using rolling_apply for pandas
问题描述
我想使用pandas.rolling_apply
函数在滚动窗口的基础上应用自己的自定义函数.
I would like to use the pandas.rolling_apply
function to apply my own custom function on a rolling window basis.
但是我的函数需要两个参数,并且还具有两个输出.这可能吗?
but my function requires two arguments, and also has two outputs. Is this possible?
下面是一个最小的可重现示例...
Below is a minimum reproducible example...
import pandas as pd
import numpy as np
import random
tmp = pd.DataFrame(np.random.randn(2000,2)/10000,
index=pd.date_range('2001-01-01',periods=2000),
columns=['A','B'])
def gm(df,p):
v =(((df+1).cumprod())-1)*p
return v.iloc[-1]
# an example output when subsetting for just 2001
gm(tmp['2001'],5)
# the aim is to do it on a rolling basis over a 50 day window
# whilst also getting both outputs and also allows me to add in the parameter p=5
# or any other number I want p to be...
pd.rolling_apply(tmp,50,gm)
这会导致错误...因为gm需要两个参数...
which leads to an error...since gm takes two arguments...
任何帮助将不胜感激...
any help would be greatly appreciated...
编辑
按照Jeff的说明,我已经取得了进步,但仍在为两个或多个列输出苦苦挣扎,因此,如果我改为创建一个新函数(如下),该函数仅返回两个随机数(与上一个计算无关)而不是最后一个v行,出现错误TypeError: only length-1 arrays can be converted to Python scalars
.如果
Following Jeff's comment I have progressed, but am still struggling with two or more column outputs, so if instead i make a new function (below) which just returns two random numbers (unconnected to the previous calculation) instead rather than the last rows of v, I get an error of TypeError: only length-1 arrays can be converted to Python scalars
. This function works if
def gm2(df,p):
df = pd.DataFrame(df)
v =(((df+1).cumprod())-1)*p
return np.random.rand(2)
pd.rolling_apply(tmp,50,lambda x: gm2(x,5)).tail(20)
如果将2更改为1 ...,此功能将起作用.
This function works if 2 is changed to 1...
推荐答案
rolling_apply
将numpy数组传递给应用的函数(当前),到0.14时它应该传递一帧.问题是此处
rolling_apply
passes numpy arrays to the applied function (at-the-moment), by 0.14 it should pass a frame. The issue is here
因此,重新定义您的函数以在numpy数组上工作. (您当然可以在这里构造一个DataFrame,但是您的索引/列名称将不同).
So redefine your function to work on a numpy array. (You can of course construct a DataFrame inside here, but your index/column names won't be the same).
In [9]: def gm(df,p):
...: v = ((np.cumprod(df+1))-1)*p
...: return v[-1]
...:
如果您想在自定义函数中使用更多的熊猫函数,请执行此操作(请注意,调用帧的标记是未通过ATM传递的.)
If you wanted to use more of pandas functions in your custom function, do this (note that the indicies of the calling frame are not passed ATM).
def gm(arr,p):
df = DataFrame(arr)
v =(((df+1).cumprod())-1)*p
return v.iloc[-1]
通过lambda传递
In [11]: pd.rolling_apply(tmp,50,lambda x: gm(x,5)).tail(20)
Out[11]:
A B
2006-06-04 0.004207 -0.002112
2006-06-05 0.003880 -0.001598
2006-06-06 0.003809 -0.002228
2006-06-07 0.002840 -0.003938
2006-06-08 0.002855 -0.004921
2006-06-09 0.002450 -0.004614
2006-06-10 0.001809 -0.004409
2006-06-11 0.001445 -0.005959
2006-06-12 0.001297 -0.006831
2006-06-13 0.000869 -0.007878
2006-06-14 0.000359 -0.008102
2006-06-15 -0.000885 -0.007996
2006-06-16 -0.001838 -0.008230
2006-06-17 -0.003036 -0.008658
2006-06-18 -0.002280 -0.008552
2006-06-19 -0.001398 -0.007831
2006-06-20 -0.000648 -0.007828
2006-06-21 -0.000799 -0.007616
2006-06-22 -0.001096 -0.006740
2006-06-23 -0.001160 -0.006004
[20 rows x 2 columns]
这篇关于使用rolling_apply对pandas的Python自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!