使用seaborn clustermap提取分层聚类中的聚类行 [英] Extract rows of clusters in hierarchical clustering using seaborn clustermap

查看:490
本文介绍了使用seaborn clustermap提取分层聚类中的聚类行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用seaborn.clustermap中的分层聚类对数据进行聚类.这可以很好地很好地可视化热图中的群集.但是,现在我想提取分配给不同群集的所有行值.

I am using hierarchical clustering from seaborn.clustermap to cluster my data. This works fine to nicely visualize the clusters in a heatmap. However, now I would like to extract all row values that are assigned to the different clusters.

这是我的数据:

import pandas as pd

# load DataFrame 
df = pd.read_csv('expression_data.txt', sep='\t', index_col=0)

df 

    log_HU1         log_HU2
EEF1A1  13.439499   13.746856
HSPA8   13.169191   12.983910
FTH1    13.861164   13.511200
PABPC1  12.142340   11.885885
TFRC    11.261368   10.433607
RPL26   13.837205   13.934710
NPM1    12.381585   11.956855
RPS4X   13.359880   12.588574
EEF2    11.076926   11.379336
RPS11   13.212654   13.915813
RPS2    12.910164   13.009184
RPL11   13.498649   13.453234
CA1 9.060244    13.152061
RPS3    11.243343   11.431791
YBX1    12.135316   12.100374
ACTB    11.592359   12.108637
RPL4    12.168588   12.184330
HSP90AA1    10.776370   10.550427
HSP90AB1    11.200892   11.457365
NCL 11.366145   11.060236

然后我使用seaborn进行聚类,如下所示:

Then I perform the clustering using seaborn as follows:

fig = sns.clustermap(df)

产生以下簇图:

对于此示例,我也许能够手动解释属于每个群集(例如TFRC和HSP90AA1群集)的值.但是,我计划对更大的数据集进行这些聚类分析.

For this example I may be able to manually interpret the values belonging to each cluster (e.g. that TFRC and HSP90AA1 cluster). However I am planning to do these clustering analysis on much bigger data sets.

所以我的问题是:有人知道如何获取属于每个群集的行值吗?

So my question is: does anyone know how to get the row values belonging to each cluster?

谢谢

推荐答案

将scipy.cluster.hierarchy模块与fcluster结合使用可进行集群检索:

Using scipy.cluster.hierarchy module with fcluster allows cluster retrieval:

import pandas as pd
import seaborn as sns
import scipy.cluster.hierarchy as sch

df = pd.read_csv('expression_data.txt', sep='\t', index_col=0)

# retrieve clusters using fcluster 
d = sch.distance.pdist(df)
L = sch.linkage(d, method='complete')
# 0.2 can be modified to retrieve more stringent or relaxed clusters
clusters = sch.fcluster(L, 0.2*d.max(), 'distance')

# clusters indicices correspond to incides of original df
for i,cluster in enumerate(clusters):
    print(df.index[i], cluster)

出局:

EEF1A1 2
HSPA8 1
FTH1 2
PABPC1 3
TFRC 5
RPL26 2
NPM1 3
RPS4X 1
EEF2 4
RPS11 2
RPS2 1
RPL11 2
CA1 6
RPS3 4
YBX1 3
ACTB 3
RPL4 3
HSP90AA1 5
HSP90AB1 4
NCL 4

这篇关于使用seaborn clustermap提取分层聚类中的聚类行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆