Numpy/Pandas关联2个不同长度的数组 [英] Numpy/Pandas correlate 2 arrays of different length

查看:93
本文介绍了Numpy/Pandas关联2个不同长度的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为长度不相同的2个数据集计算相关系数.以下代码仅适用于等长数组.

 将numpy导入为np从scipy.stats.stats导入pearsonra = [0,0.4,0.2,0.4,0.2,0.4,0.2,0.5]b = [25、40、62、58、53、54]打印出皮尔森(a,b) 

在我的情况下, b 矢量长度可以在50至100 datpoints之间变化.虽然我要匹配的功能是标准的.附有 a 的图片.还有其他与这些模式匹配的首选模块吗?

解决方案

参加聚会的时间不算太晚,但是由于这是Google的最佳成绩,因此我将为这个问题提出一个可能的答案:

 将熊猫作为pd导入从scipy.stats导入pearsonr将numpy导入为npa = [0、0.4、0.2、0.4、0.2、0.45、0.2、0.52、0.52、0.4、0.21、0.2、0.4、0.51]b = [0.4,0.2,0.5]df = pd.DataFrame(dict(x = a))CORR_VALS = np.array(b)def get_correlation(vals):返回pearsonr(vals,CORR_VALS)[0]df['correlation'] = df.rolling(window=len(CORR_VALS)).apply(get_correlation) 

说明

pandas 数据框具有 rolling()方法,该方法将数组长度长度( window )作为参数.从 rolling() 返回的对象具有

带有问题中的示例数据

 在[1]中:df出[1]:x相关0 0.0 NaN1 0.4钠2 0.2 NaN3 0.4 NaN4 0.2 NaN5 0.4 0.5279326 0.2 -0.1591677 0.5 0.189482 

I'm trying to calculate correlation coefficient for 2 datasets which are not of same length. The below code works only for equal length arrays.

import numpy as np
from scipy.stats.stats import pearsonr

a = [0, 0.4, 0.2, 0.4, 0.2, 0.4, 0.2, 0.5]
b = [25, 40, 62, 58, 53, 54]

print pearsonr(a, b)

In my case the b vector length can vary from 50 - 100 datpoints. While the function I want to match is standard. Attached image of a. Is there any other preferred modules to match such patterns?

解决方案

Little late for the party, but since this is a Google top result, I'll throw a possible answer to this problem:

import pandas as pd
from scipy.stats import pearsonr 
import numpy as np 


a = [ 0, 0.4, 0.2, 0.4, 0.2, 0.45, 0.2, 0.52, 0.52, 0.4, 0.21, 0.2, 0.4, 0.51]
b = [ 0.4, 0.2, 0.5]


df = pd.DataFrame(dict(x=a))

CORR_VALS = np.array(b)
def get_correlation(vals):
    return pearsonr(vals, CORR_VALS)[0]

df['correlation'] = df.rolling(window=len(CORR_VALS)).apply(get_correlation)

Explanation

pandas DataFrames have rolling() method that takes array length length (window) as argument. The object that is returned from rolling() has apply() method that takes function as an argument. You can calculate for example the Pearson Correlation coefficient using pearsonr from scipy.stats.

Example output

In [2]: df['correlation'].values
Out[2]:
array([        nan,         nan, -0.65465367,  0.94491118, -0.94491118,
        0.98974332, -0.94491118,  0.9923356 , -0.18898224, -0.75592895,
       -0.44673396,  0.1452278 ,  0.78423011,  0.16661846])

With the example data in the question

In [1]: df
Out[1]:
     x  correlation
0  0.0          NaN
1  0.4          NaN
2  0.2          NaN
3  0.4          NaN
4  0.2          NaN
5  0.4     0.527932
6  0.2    -0.159167
7  0.5     0.189482

这篇关于Numpy/Pandas关联2个不同长度的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆