从seaborn clustermap中提取集群 [英] Extracting clusters from seaborn clustermap
问题描述
我正在使用 seaborn clustermap
创建集群,并且在外观上效果很好(此示例产生非常相似的结果)。
I am using the seaborn clustermap
to create clusters and visually it works great (this example produces very similar results).
但是我在弄清楚如何以编程方式提取群集时遇到了麻烦。例如,在示例链接中,我如何找出1-1 rh,1-1 lh,5-1 rh,5-1 lh组成一个好的簇?看起来很容易。我正在尝试使用浏览数据和树状图的方法,但是我收效甚微
However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link, how could I find out that 1-1 rh, 1-1 lh, 5-1 rh, 5-1 lh make a good cluster? Visually it's easy. I am trying to use methods of looking through the data, and dendrograms but I'm having little success
EDIT 示例代码:
import pandas as pd
import seaborn as sns
sns.set(font="monospace")
df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)
used_networks = [1, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
.astype(int)
.isin(used_networks))
df = df.loc[:, used_columns]
network_pal = sns.cubehelix_palette(len(used_networks),
light=.9, dark=.1, reverse=True,
start=1, rot=-2)
network_lut = dict(zip(map(str, used_networks), network_pal))
networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks).map(network_lut)
cmap = sns.diverging_palette(h_neg=210, h_pos=350, s=90, l=30, as_cmap=True)
result = sns.clustermap(df.corr(), row_colors=network_colors, method="average",
col_colors=network_colors, figsize=(13, 13), cmap=cmap)
如何从结果
中提取集群中的哪些模型?
How can I pull what models are in which clusters out of result
?
EDIT2 结果
确实带有链接
和
EDIT2 The result
does carry with it a linkage
in with the dendrogram_col
which I THINK would work with fcluster. But the threshold value to select that is confusing me. I would assume that values in the heatmap that are higher than the threshold would get clustered together?
推荐答案
同时使用 result.linkage.dendrogram_col
或 result.linkage.dendrogram_row
当前有效,这似乎是一个实现细节。最安全的方法是首先显式计算链接并将它们传递给 clustermap
函数,该函数具有 row_linkage
和 col_linkage
参数仅用于此目的。
While using result.linkage.dendrogram_col
or result.linkage.dendrogram_row
will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap
function, which has row_linkage
and col_linkage
parameters just for that.
替换示例中的最后一行( result =
...)和下面的代码将获得与以前相同的结果,但是您还将具有 row_linkage
和 col_linkage $可与
fcluster
等配合使用的c $ c>变量。
Replacing the last line in your example (result =
...) with the following code gives the same result as before, but you will also have row_linkage
and col_linkage
variables that you can use with fcluster
etc.
from scipy.spatial import distance
from scipy.cluster import hierarchy
correlations = df.corr()
correlations_array = np.asarray(df.corr())
row_linkage = hierarchy.linkage(
distance.pdist(correlations_array), method='average')
col_linkage = hierarchy.linkage(
distance.pdist(correlations_array.T), method='average')
sns.clustermap(correlations, row_linkage=row_linkage, col_linkage=col_linkage, row_colors=network_colors, method="average",
col_colors=network_colors, figsize=(13, 13), cmap=cmap)
在此特定示例中,由于相关数组是对称的,因此可以进一步简化代码,因此 row_linkage
和 col_linkage
是相同的。
In this particular example, the code could be simplified more since the correlations array is symmetric and therefore row_linkage
and col_linkage
will be identical.
注意:上一个答案包括对 distance.squareshape <的调用/ code>根据seaborn中的代码执行的操作,但是是一个错误。
这篇关于从seaborn clustermap中提取集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!