scipy.cluster.hierarchy:标签的顺序似乎不正确,并且被垂直轴的值弄糊涂了 [英] scipy.cluster.hierarchy: labels seems not in the right order, and confused by the value of the vertical axes
问题描述
我知道 scipy.cluster.hierarchy 专注于处理距离矩阵.但是现在我有了一个相似度矩阵……在我使用树状图绘制它之后,发生了一些奇怪的事情.代码如下:
I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens. Here is the code:
similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0,0.25,0.25,1,0.25,0.25,0],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0,0.75,0.75,1]))
这是链接方法
Z_sim = sch.linkage(similarityMatrix)
plt.figure(1)
plt.title('similarity')
sch.dendrogram(
Z_sim,
labels=['1','2','3','4','5','6','7']
)
plt.show()
但这是结果:
我的问题是:
- 为什么这个树状图的标签不正确?
- 我为链接方法提供了一个相似度矩阵,但我无法完全理解垂直轴的含义.比如最大相似度为1,为什么纵轴的最大值接近1.6?
非常感谢您的帮助!
推荐答案
linkage
需要距离",而不是相似性".要将您的矩阵转换为距离矩阵之类的东西,您可以将其从 1 中减去:linkage
expects "distances", not "similarities". To convert your matrix to something like a distance matrix, you can subtract it from 1:dist = 1 - similarityMatrix
-
链接
不接受平方距离矩阵.它期望距离数据采用压缩"形式.您可以使用scipy.spatial.distance.squareform
: linkage
does not accept a square distance matrix. It expects the distance data to be in "condensed" form. You can get that usingscipy.spatial.distance.squareform
:from scipy.spatial.distance import squareform dist = 1 - similarityMatrix condensed_dist = squareform(dist) Z_sim = sch.linkage(condensed_dist)
(当您将形状为 (m, n) 的二维数组传递给
linkage
时,它会将行视为 n 维空间中的点,并在内部计算距离.)>
(When you pass a two-dimensional array with shape (m, n) to
linkage
, it treats the rows as points in n-dimensional space, and computes the distances internally.)这篇关于scipy.cluster.hierarchy:标签的顺序似乎不正确,并且被垂直轴的值弄糊涂了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!