是否可以在Python中与一个固定系列进行运行相关性? [英] Is it possible to do running correlation with one fixed series in Python?

查看:128
本文介绍了是否可以在Python中与一个固定系列进行运行相关性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有一种快速的方法可以用一个固定的序列在Python中运行关联?我尝试使用Pandas,例如:df1.rolling(4).corr(df2).但是,它要求两个DataFrame具有相同的长度.有没有一种方法可以类似于上面的Pandas示例,但是固定了一个DataFrame?

为澄清起见,我想计算下面的df2与df1中的值之间的相关系数.

示例:df2和df1.loc [0:3]之间的第一相关性df2和df1.loc [1:4]

之间的第二相关性

我已经通过创建一个循环来做到这一点.但是,我发现在使用较大的DataFrame时效率低下.

  df1 = pd.DataFrame([1,3,2,4,5,6,3,4])df2 = pd.DataFrame([1,2,3,2]) 

解决方案

您可以使用

I'm wondering if there is a fast way to do running correlation in Python with one fixed series? I've tried to use Pandas and for example: df1.rolling(4).corr(df2). However, it requires two DataFrames to have the same length. Is there a way to do similiar to the above Pandas example, but with one DataFrame being fixed?

To clarify, I would want to calculate the correlation coefficent between df2 below and the values in df1.

Example: First correlation between df2 and df1.loc[0:3] Second correlation between df2 and df1.loc[1:4]

etc.

I've managed to do this by creating a loop. However, I find it inefficent when working with larger DataFrames.

df1 = pd.DataFrame([1,3,2,4,5,6,3,4])
df2 = pd.DataFrame([1,2,3,2])

解决方案

You can use the pandas.DataFrame.rolling which returns pandas.core.window.Rolling which has apply method. Then you could pass to apply() any function that calculates the correction you want.

Example

import pandas as pd
from scipy.stats import pearsonr 
import numpy as np 


df1 = pd.DataFrame([1,3,2,4,5,6,3,4,1,2,3,2,2,3,2,5,1,2,1,2,8,8,8,8,8,8,8])
df2 = pd.DataFrame([1,2,3,2])

CORR_VALS = df2[0].values
def get_correlation(vals):
    return pearsonr(vals, CORR_VALS)[0]

df1['correlation'] = df1.rolling(window=len(CORR_VALS)).apply(get_correlation)

  • Note that the window argument in the df1.rolling() should have the same length as the array you are calculating correlation against.

this outputs

In [5]: df1['correlation'].values
Out[5]:
array([        nan,         nan,         nan,  0.31622777,  0.31622777,
        0.71713717,  0.63245553, -0.63245553, -0.39223227, -0.63245553,
       -0.63245553,  1.        ,  0.        , -0.70710678,  0.81649658,
        0.        ,  0.47809144, -0.23570226, -0.64699664,  0.        ,
        0.        ,  0.7570333 ,  0.76509206,  0.11043153, -0.77302068,
       -0.11043153,  0.86164044])

which would look like this:

这篇关于是否可以在Python中与一个固定系列进行运行相关性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆