距离矩阵的树状图或其他图 [英] Dendrogram or Other Plot from Distance Matrix

查看:358
本文介绍了距离矩阵的树状图或其他图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要比较三个矩阵.每个都是5x6.我最初想使用分层聚类对矩阵进行聚类,以便在相似度阈值给定的情况下,将最相似的矩阵进行分组.

I have three matrices to compare. Each of them is 5x6. I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are grouped, given a threshold of similarity.

我在python中找不到任何这样的函数,所以我手动实现了距离测量,

I could not find any such functions in python, so I implemented the distance measure by hand, (p-norm where p=2). Now I have a 3x3 distance matrix (which I believe is also a similarity matrix in this case).

我现在正在尝试生成树状图.这是我的代码,这是错误的.我想要生成一张图(如果可能,则为树状图),该图显示了最相似的矩阵簇.矩阵0、1、2、0和2是相同的,应首先聚在一起,而1则不同.

I am now trying to produce a dendrogram. This is my code, and this is what is wrong. I want to produce a graph (a dendrogram if possible) that shows clusters of the matrices that are most similar. Of matrices 0,1,2, 0 and 2 are the same and should be clustered together first, and 1 is different.

距离矩阵如下:

>   0     1    2 
0   0.0    2.0  3.85e-16
1   2.0    0.0  2.0
2 3.85e-16 2.0  0.0

代码:

from scipy.cluster.hierarchy import dendrogram
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.hierarchy import linkage
mat = np.array([[0.0, 2.0, 3.8459253727671276e-16], [2.0, 0.0, 2.0], [3.8459253727671276e-16, 2.0, 0.0]])
dist_mat = mat
linkage_matrix = linkage(dist_mat, "single")
dendrogram(linkage_matrix, color_threshold=1, labels=["0", "1", "2"],show_leaf_counts=True)
plt.title=("test")
plt.show()

这是输出:

链接(dist_mat,单")的含义是什么?我会假设输出图看起来像这样,例如0和1之间的距离是2.0.

What is the meaning of the linkage(dist_mat, 'single')? I would have assumed the output graph to look something like this, where the distance is 2.0 between 0 and 1 (for example).

是否有更好的方法来表示这些数据?是否有一个函数可以吸收多个矩阵而不是点,以比较并形成距离矩阵,然后进行聚类?我愿意就如何可视化这些矩阵之间的差异提出其他建议.

Are there better ways to represent these data? Is there a function that could take in several matrices instead of points, to compare and form a distance matrix, and then cluster? I am open to other suggestions on how to visualize the differences between these matrices.

推荐答案

condensed 距离矩阵 .您的情况是np.array([2.0, 3.8459253727671276e-16, 2]).您可以使用

The first argument of linkage should not be the square distance matrix. It must be the condensed distance matrix. In your case, that would be np.array([2.0, 3.8459253727671276e-16, 2]). You can convert from the square distance matrix to the condensed form using scipy.spatial.distance.squareform

如果将二维数组传递给形状为(m, n)linkage,则将其视为n维空间中的m点的数组,并计算这些点本身的距离.这就是为什么在传递平方距离矩阵时没有出现错误的原因-但是您得到了错误的绘图. (这是linkage的未记录功能".)

If you pass a two dimensional array to linkage with shape (m, n), it treats it as an array of m points in n-dimensional space and it computes the distances of those points itself. That's why you didn't get an error when you passed in the square distance matrix--but you got an incorrect plot. (This is an undocumented "feature" of linkage.)

还请注意,由于距离3.8e-16很小,因此与点0和点2之间的链接相关的水平线可能在图形中不可见-它在x轴上.

Also note that because the distance 3.8e-16 is so small, the horizontal line associated with the link between points 0 and 2 might not be visible in the plot--it is on the x axis.

这是脚本的修改版本.在此示例中,我已将该微小距离更改为0.1,因此关联的群集不会被x轴遮挡.

Here's a modified version of your script. For this example, I've changed that tiny distance to 0.1, so the associated cluster is not obscured by the x axis.

import numpy as np

from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform

import matplotlib.pyplot as plt


mat = np.array([[0.0, 2.0, 0.1], [2.0, 0.0, 2.0], [0.1, 2.0, 0.0]])
dists = squareform(mat)
linkage_matrix = linkage(dists, "single")
dendrogram(linkage_matrix, labels=["0", "1", "2"])
plt.title("test")
plt.show()

这是脚本创建的图:

这篇关于距离矩阵的树状图或其他图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆