列出大关联矩阵中的最高关联对? [英] List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?

查看:49
本文介绍了列出大关联矩阵中的最高关联对?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在与熊猫相关的矩阵中找到最相关的?关于如何使用R(从Python或R 中的大型数据集中获取高度相关对的有效方法,但是我想知道如何对熊猫进行处理吗?在我的情况下,矩阵为4460x4460,因此无法从视觉上做到.

How do you find the top correlations in a correlation matrix with Pandas? There are many answers on how to do this with R (Show correlations as an ordered list, not as a large matrix or Efficient way to get highly correlated pairs from large data set in Python or R), but I am wondering how to do it with pandas? In my case the matrix is 4460x4460, so can't do it visually.

推荐答案

您可以使用DataFrame.values获取数据的numpy数组,然后使用NumPy函数(例如argsort())获取相关性最高的对.

You can use DataFrame.values to get an numpy array of the data and then use NumPy functions such as argsort() to get the most correlated pairs.

但是,如果要在熊猫中执行此操作,则可以unstack并对DataFrame进行排序:

But if you want to do this in pandas, you can unstack and sort the DataFrame:

import pandas as pd
import numpy as np

shape = (50, 4460)

data = np.random.normal(size=shape)

data[:, 1000] += data[:, 2000]

df = pd.DataFrame(data)

c = df.corr().abs()

s = c.unstack()
so = s.sort_values(kind="quicksort")

print so[-4470:-4460]

以下是输出:

2192  1522    0.636198
1522  2192    0.636198
3677  2027    0.641817
2027  3677    0.641817
242   130     0.646760
130   242     0.646760
1171  2733    0.670048
2733  1171    0.670048
1000  2000    0.742340
2000  1000    0.742340
dtype: float64

这篇关于列出大关联矩阵中的最高关联对?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆