使用在Pandas中需要2个参数的函数使用rolling_apply [英] Using rolling_apply with a function that requires 2 arguments in Pandas

查看:4587
本文介绍了使用在Pandas中需要2个参数的函数使用rolling_apply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用需要2个参数的公式的rollapply。据我所知,唯一的方法(除非你从头开始创建公式)来计算kendall tau相关性,包括标准连接校正:

I'm trying to use rollapply with a formula that requires 2 arguments. To my knowledge the only way (unless you create the formula from scratch) to calculate kendall tau correlation, with standard tie correction included is:

>>> import scipy
>>> x = [5.05, 6.75, 3.21, 2.66]
>>> y = [1.65, 26.5, -5.93, 7.96]
>>> z = [1.65, 2.64, 2.64, 6.95]
>>> print scipy.stats.stats.kendalltau(x, y)[0]
0.333333333333

我也知道滚动问题和两个参数,如下所示:

I'm also aware of the problem with rollapply and taking two arguments, as documented here:

  • Related Question 1
  • Github Issue
  • Related Question 2

但是,我仍然努力找到一种在滚动的基础上在多列数据框上进行kendalltau计算的方法。

Still, I'm struggling to find a way to do the kendalltau calculation on a dataframe with multiple columns on a rolling basis.

我的数据框是这样的, p>

My dataframe is something like this

A = pd.DataFrame([[1, 5, 1], [2, 4, 1], [3, 3, 1], [4, 2, 1], [5, 1, 1]], 
                 columns=['A', 'B', 'C'], index = [1, 2, 3, 4, 5])

尝试创建一个这样的功能

Trying to create a function that does this

In [1]:function(A, 3)  # A is df, 3 is the rolling window
Out[2]:
   A  B  C     AB     AC     BC  
1  1  5  2    NaN    NaN    NaN
2  2  4  4    NaN    NaN    NaN
3  3  3  1  -0.99  -0.33   0.33
4  4  2  2  -0.99  -0.33   0.33
5  5  1  4  -0.99   0.99  -0.99

在一个非常初步的方法中,我接受了定义这样的功能的想法:

In a very preliminary approach I entertained the idea of defining the function like this:

def tau1(x):
    y = np.array(A['A']) #  keep one column fix and run it in the other two
    tau, p_value = sp.stats.kendalltau(x, y)
    return tau

 A['AB'] = pd.rolling_apply(A['B'], 3, lambda x: tau1(x))

它没有工作。我得到:

ValueError: all keys need to be the same shape

我明白并不是一个微不足道的问题。我感谢任何输入。

I understand is not a trivial problem. I appreciate any input.

推荐答案

截至Pandas 0.14 rolling_apply 仅将NumPy数组传递给该函数。可能的解决方法是将 np.arange(len(A))作为 rolling_apply 的第一个参数,以便 tau 函数接收要使用的行的索引。然后在 tau 函数中,

As of Pandas 0.14, rolling_apply only passes NumPy arrays to the function. A possible workaround is to pass np.arange(len(A)) as the first argument to rolling_apply, so that the tau function receives the index of the rows you wish to use. Then within the tau function,

B = A[[col1, col2]].iloc[idx]

返回包含所需行数据的DataFrame。




returns a DataFrame containing all the rows required.

import numpy as np
import pandas as pd
import scipy.stats as stats
import itertools as IT

A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]], 
                 columns=['A', 'B', 'C'], index = [1, 2, 3, 4, 5])

for col1, col2 in IT.combinations(A.columns, 2):
    def tau(idx):
        B = A[[col1, col2]].iloc[idx]
        return stats.kendalltau(B[col1], B[col2])[0]
    A[col1+col2] = pd.rolling_apply(np.arange(len(A)), 3, tau)

print(A)    

yield

   A  B  C  AB        AC        BC
1  1  5  2 NaN       NaN       NaN
2  2  4  4 NaN       NaN       NaN
3  3  3  1  -1 -0.333333  0.333333
4  4  2  2  -1 -0.333333  0.333333
5  5  1  4  -1  1.000000 -1.000000

这篇关于使用在Pandas中需要2个参数的函数使用rolling_apply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆