如何给sns.clustermap一个预先计算的距离矩阵? [英] How to give sns.clustermap a precomputed distance matrix?

查看:123
本文介绍了如何给sns.clustermap一个预先计算的距离矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,当我做树状图和热图时,我使用距离矩阵并做一堆SciPy东西.我想尝试Seaborn,但Seaborn希望我的数据为矩形格式(行=样本,cols =属性,而不是距离矩阵)?

Usually when I do dendrograms and heatmaps, I use a distance matrix and do a bunch of SciPy stuff. I want to try out Seaborn but Seaborn wants my data in rectangular form (rows=samples, cols=attributes, not a distance matrix)?

我本质上想使用seaborn作为后端来计算我的树状图并将其粘贴到我的热图上.这可能吗?如果不是这样,将来是否可以将其作为功能.

I essentially want to use seaborn as the backend to compute my dendrogram and tack it on to my heatmap. Is this possible? If not, can this be a feature in the future.

也许我可以调整一些参数,以便可以使用距离矩阵而不是矩形矩阵?

用法:

seaborn.clustermap¶
seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean',
 z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True,
 col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None,
 col_colors=None, mask=None, **kwargs)

我的下面的代码:

from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)

我认为我的方法在下面是不正确的,因为我给了它一个预先计算的距离矩阵,而不是它所要求的矩形数据矩阵.没有使用clustermap的相关性/距离矩阵的示例,但是有

I don't think my method is correct below because I'm giving it a precomputed distance matrix and NOT a rectangular data matrix as it requests. There's no examples of how to use a correlation/distance matrix with clustermap but there is for https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html but the ordering is not clustered w/ the plain sns.heatmap func.

DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr
sns.clustermap(DF_dism)

推荐答案

您可以将预先计算的距离矩阵作为链接传递给clustermap():

You can pass the precomputed distance matrix as linkage to clustermap():

import pandas as pd, seaborn as sns
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
from sklearn.datasets import load_iris
sns.set(font="monospace")

iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)

DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr   # distance matrix
linkage = hc.linkage(sp.distance.squareform(DF_dism), method='average')
sns.clustermap(DF_dism, row_linkage=linkage, col_linkage=linkage)

对于clustermap(distance_matrix)(即,未通过链接),链接是根据距离矩阵中行和列的成对距离(内部请参见下面的详细信息)在内部计算的,而不是使用距离矩阵的元素来计算的直接(正确的解决方案).结果,输出与问题中的输出有些不同:

For clustermap(distance_matrix) (i.e., without linkage passed), the linkage is calculated internally based on pairwise distances of the rows and columns in the distance matrix (see note below for full details) instead of using the elements of the distance matrix directly (the correct solution). As a result, the output is somewhat different from the one in the question:

注意:如果没有row_linkage传递给clustermap(),则通过将每行视为一个点"(观察)并计算这些点之间的成对距离来内部确定行链接.因此行树状图反映了行相似性.与col_linkage类似,其中每一列都被视为一个点.该解释可能应该添加到 docs 中.在这里,文档的第一个示例经过修改以使内部链接计算明确:

Note: if no row_linkage is passed to clustermap(), the row linkage is determined internally by considering each row a "point" (observation) and calculating the pairwise distances between the points. So the row dendrogram reflects row similarity. Analogous for col_linkage, where each column is considered a point. This explanation should likely be added to the docs. Here the docs's first example modified to make the internal linkage calculation explicit:

import seaborn as sns; sns.set()
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average')
  for x in (flights.values, flights.values.T))
g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage) 
  # note: this produces the same plot as "sns.clustermap(flights)", where
  #  clustermap() calculates the row and column linkages internally

这篇关于如何给sns.clustermap一个预先计算的距离矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆