我们如何仅显示在热图中超过特定​​阈值相关的特征? [英] How can we show ONLY features that are correlated over a certain threshold in a heatmap?

查看:99
本文介绍了我们如何仅显示在热图中超过特定​​阈值相关的特征?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据框中有太多功能.我试图仅绘制在某个阈值上(例如超过80%)相关的特征,并在热图中显示这些特征.我将一些代码放在一起,它可以运行,但是我仍然看到一些白线,它们没有数据,因此也没有相关性.另外,我发现相关度低于80%.这是我尝试的代码.

I've got too many features in a data frame. I'm trying to plot ONLY the features which are correlated over a certain threshold, let's say over 80%, and show those in a heatmap. I put some code together, and it runs, but I still see some white lines, which have no data, and thus no correlation. Also, I'm seeing things that are well under 80% correlation. Here is the code that I tried.

import seaborn
c = newdf.corr()
plt.figure(figsize=(10,10))
seaborn.heatmap(c, cmap='RdYlGn_r', mask = (np.abs(c) >= 0.8))
plt.show()

当我运行它时,我看到了.

When I run that, I see this.

这是怎么了?

我正在做一些小小的更新,其中有一些新发现.

I am making a small update, with some new findings.

这只会更正> .8.

This gets ONLY corr>.8.

corr = newdf.corr()
kot = corr[corr>=.8]
plt.figure(figsize=(12,8))
sns.heatmap(kot, cmap="Reds")

这似乎行得通,但仍然给了我很多白色!我认为应该有一种方法,仅包括具有一定相关性的项目.也许您必须将那些具有> .8项目的项目复制到新的数据框中,并根据该对象建立相关性.我不确定这是如何工作的.

That seems to work, but it still gives me a lot of white! I thought there should be a way to include only the items that have a correlation over a certain amount. Maybe you have to copy those items with >.8 items to a new data frame and build the correlation off of that object. I'm not sure how this works.

推荐答案

以下代码将高度相关的要素(相关性在幅度上大于0.8)分组为组件,并分别绘制每个组件组的相关性.请让我知道它是否与您想要的不一样.

The following code groups the strongly correlated features (with correlation above 0.8 in magnitude) into components and plots the correlation for each group of components individually. Please let me know if it differs from what you want.

components = list()
visited = set()
print(newdf.columns)
for col in newdf.columns:
    if col in visited:
        continue

    component = set([col, ])
    just_visited = [col, ]
    visited.add(col)
    while just_visited:
        c = just_visited.pop(0)
        for idx, val in corr[c].items():
            if abs(val) > 0.999 and idx not in visited:
                just_visited.append(idx)
                visited.add(idx)
                component.add(idx)
    components.append(component)

for component in components:
    plt.figure(figsize=(12,8))
    sns.heatmap(corr.loc[component, component], cmap="Reds")

这篇关于我们如何仅显示在热图中超过特定​​阈值相关的特征?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆