带有两个参数的python pandas滚动功能 [英] python pandas rolling function with two arguments

查看:103
本文介绍了带有两个参数的python pandas滚动功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的初学者对python的热爱正在经历艰苦的尝试...

My beginner's love for python is undergoing a hard trial...

我需要在固定长度的滚动窗口中计算一个函数(比方说:5).该函数需要两个参数.我很清楚此处的答案几乎相同,但是我不断收到错误消息.

I need to calculate a function in a rolling window of a fixed length (let's say: 5). The function requires two parameters. I am well aware of the answer here which is nearly identical, but I keep getting errors.

我的代码很简单:

import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats

df = pd.DataFrame( {'A' : np.arange(20), 'B' : np.random.randint(0,20,20)})

def my_tau2(idx):
    x = df.loc[idx, 'A'].astype('float')
    y = df.loc[idx, 'B'].astype('float')
    return scipy.stats.mstats.kendalltau(x, y)[0] ## breaks without this [0]

pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)

我一直收到以下错误:

enter code
File "<ipython-input-6-d6cbc608d2f0>", line 7, in <module>
pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
result = getattr(r, name)(*args, **kwds)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 863, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 621, in apply
center=False)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 560, in _apply
result = calc(values)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 555, in calc
return func(x, window, min_periods=self.min_periods)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 618, in f
kwargs)
File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51581)
TypeError: a float is required

我一直在为此而苦苦挣扎,我要疯了. 我的模块版本是:

I've been struggling with that and I'm going bonkers. My module versions are:

  • numpy:1.11.0
  • scipy:0.17.1
  • 熊猫:0.18.1
  • python:3.5.1

任何提示竭诚欢迎您以其他方式解决或计算此问题.

Any hints w.r.t. how to fix or calculate this in another way are wholeheartedly welcome.

推荐答案

我不熟悉kendall tau coeficient,但是根据上面的链接文章,也许您应该重写tau函数以仅返回一个值.因此,从您提供的链接来看,我会像下面那样设计您的tau(我认为仍然不太灵活,因为它使用了外部作用域中的硬编码列名):

I am not familiar with kendall tau coeficient, but according to the above linked post, maybe you should rewrite your tau function to return one value only. So, judging by the link you provided, I would design your tau like following (still not too flexible, in my opinion, since it uses hardcoded column names from outer scope):

def my_tau2(idx):
    df_tau = df[["A","B"]].iloc[idx]
    return scipy.stats.mstats.kendalltau(df_tau["A"], df_tau["B"])[0]

这将允许我执行rolling_apply(当然,将其保存到数据帧中-您似乎还没有完成):

That would allow me to perform rolling_apply (and of course saving it into the dataframe - which you didn't seem to have done):

df["tau"] = pd.rolling_apply(np.arange(len(df)), 5, my_tau2)

运行此命令将输出以下结果:

Running this outputed the following result:

     A   B       tau
0    0   0       NaN
1    1  11       NaN
2    2   2       NaN
3    3  11       NaN
4    4  17  0.737865
5    5   9  0.105409
6    6   5  0.000000
7    7   9 -0.527046
8    8  15 -0.105409
9    9  11  0.527046
10  10   4  0.000000
11  11   6 -0.400000
12  12  14 -0.200000
13  13  19  0.600000
14  14   0  0.200000
15  15  19  0.316228
16  16   9 -0.105409
17  17   1 -0.316228
18  18  13  0.200000
19  19  16  0.000000

这篇关于带有两个参数的python pandas滚动功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆