从seaborn clustermap中提取集群 [英] Extracting clusters from seaborn clustermap

查看:156
本文介绍了从seaborn clustermap中提取集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 seaborn clustermap 创建集群,并且在外观上效果很好(此示例产生非常相似的结果)。

I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results).

但是我在弄清楚如何以编程方式提取群集时遇到了麻烦。例如,在示例链接中,我如何找出1-1 rh,1-1 lh,5-1 rh,5-1 lh组成一个好的簇?看起来很容易。我正在尝试使用浏览数据和树状图的方法,但是我收效甚微

However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link, how could I find out that 1-1 rh, 1-1 lh, 5-1 rh, 5-1 lh make a good cluster? Visually it's easy. I am trying to use methods of looking through the data, and dendrograms but I'm having little success

EDIT 示例代码:

import pandas as pd
import seaborn as sns
sns.set(font="monospace")

df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)
used_networks = [1, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
df = df.loc[:, used_columns]

network_pal = sns.cubehelix_palette(len(used_networks),
                                    light=.9, dark=.1, reverse=True,
                                    start=1, rot=-2)
network_lut = dict(zip(map(str, used_networks), network_pal))

networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks).map(network_lut)

cmap = sns.diverging_palette(h_neg=210, h_pos=350, s=90, l=30, as_cmap=True)

result = sns.clustermap(df.corr(), row_colors=network_colors, method="average",
               col_colors=network_colors, figsize=(13, 13), cmap=cmap)

如何从结果中提取集群中的哪些模型?

How can I pull what models are in which clusters out of result?

EDIT2 结果确实带有链接我认为可以与 fcluster 。但是选择的阈值让我感到困惑。我认为热图中高于阈值的值会聚在一起吗?

EDIT2 The result does carry with it a linkage in with the dendrogram_col which I THINK would work with fcluster. But the threshold value to select that is confusing me. I would assume that values in the heatmap that are higher than the threshold would get clustered together?

推荐答案

同时使用 result.linkage.dendrogram_col result.linkage.dendrogram_row 当前有效,这似乎是一个实现细节。最安全的方法是首先显式计算链接并将它们传递给 clustermap 函数,该函数具有 row_linkage col_linkage 参数仅用于此目的。

While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that.

替换示例中的最后一行( result = ...)和下面的代码将获得与以前相同的结果,但是您还将具有 row_linkage col_linkage fcluster 等配合使用的c $ c>变量。

Replacing the last line in your example (result = ...) with the following code gives the same result as before, but you will also have row_linkage and col_linkage variables that you can use with fcluster etc.

from scipy.spatial import distance
from scipy.cluster import hierarchy

correlations = df.corr()
correlations_array = np.asarray(df.corr())

row_linkage = hierarchy.linkage(
    distance.pdist(correlations_array), method='average')

col_linkage = hierarchy.linkage(
    distance.pdist(correlations_array.T), method='average')

sns.clustermap(correlations, row_linkage=row_linkage, col_linkage=col_linkage, row_colors=network_colors, method="average",
               col_colors=network_colors, figsize=(13, 13), cmap=cmap)

在此特定示例中,由于相关数组是对称的,因此可以进一步简化代码,因此 row_linkage col_linkage 是相同的。

In this particular example, the code could be simplified more since the correlations array is symmetric and therefore row_linkage and col_linkage will be identical.

注意:上一个答案包括对 distance.squareshape <的调用/ code>根据seaborn中的代码执行的操作,但是是一个错误

这篇关于从seaborn clustermap中提取集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆