如何使用python或pandas计算所有列之间的相关性并删除高度相关的列 [英] How to calculate correlation between all columns and remove highly correlated ones using python or pandas

查看：1312 发布时间：2020/5/18 19:37:32 python numpy pandas scipy

本文介绍了如何使用python或pandas计算所有列之间的相关性并删除高度相关的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个庞大的数据集，并且在机器学习建模之前，总是建议您首先删除高度相关的描述符(列)，我该如何计算列的相关性并删除具有阈值的列，例如删除所有具有> 0.8相关性的列或描述符.还要保留标头中的reduce数据.

I have a huge data set and prior to machine learning modeling it is always suggested that first you should remove highly correlated descriptors(columns) how can i calculate the column wice correlation and remove the column with a threshold value say remove all the columns or descriptors having >0.8 correlation. also it should retained the headers in reduce data..

示例数据集

 GA      PN       PC     MBP      GR     AP   
0.033   6.652   6.681   0.194   0.874   3.177    
0.034   9.039   6.224   0.194   1.137   3.4      
0.035   10.936  10.304  1.015   0.911   4.9      
0.022   10.11   9.603   1.374   0.848   4.566    
0.035   2.963   17.156  0.599   0.823   9.406    
0.033   10.872  10.244  1.015   0.574   4.871     
0.035   21.694  22.389  1.015   0.859   9.259     
0.035   10.936  10.304  1.015   0.911   4.5

请帮助....

推荐答案

这是我使用的方法-

def correlation(dataset, threshold):
    col_corr = set() # Set of all the names of deleted columns
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if (corr_matrix.iloc[i, j] >= threshold) and (corr_matrix.columns[j] not in col_corr):
                colname = corr_matrix.columns[i] # getting the name of column
                col_corr.add(colname)
                if colname in dataset.columns:
                    del dataset[colname] # deleting the column from the dataset

    print(dataset)

希望这会有所帮助！

这篇关于如何使用python或pandas计算所有列之间的相关性并删除高度相关的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用python或pandas计算所有列之间的相关性并删除高度相关的列 [英] How to calculate correlation between all columns and remove highly correlated ones using python or pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用python或pandas计算所有列之间的相关性并删除高度相关的列 [英] How to calculate correlation between all columns and remove highly correlated ones using python or pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭