在python中的数据矩阵的顶部绘制层次聚类的结果 [英] plotting results of hierarchical clustering ontop of a matrix of data in python

查看:100
本文介绍了在python中的数据矩阵的顶部绘制层次聚类的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,我如何在值矩阵的顶部绘制树状图,并对其进行适当地重新排序以反映聚类?下图是一个示例:

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is the following figure:

https://publishing-cdn.elifesciences. org/07103/elife-07103-fig6-figsupp1-v2.jpg

我使用scipy.cluster.dendrogram制作树状图,并对数据矩阵执行分层聚类.然后,我如何才能将数据绘制为矩阵,在该矩阵中对行进行重新排序以反映在特定阈值处切割树状图而引起的聚类,并且将树状图绘制在矩阵旁边?我知道如何以密密麻麻的方式绘制树状图,但是不知道如何在数据强度矩阵旁边绘制正确的比例尺.

I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. How can I then plot the data as a matrix where the rows have been reordered to reflect a clustering induced by the cutting the dendrogram at a particular threshold, and have the dendrogram plotted alongside the matrix? I know how to plot the dendrogram in scipy, but not how to plot the intensity matrix of data with the right scale bar next to it.

在此方面的任何帮助将不胜感激.

Any help on this would be greatly appreciated.

推荐答案

该问题并未很好地定义矩阵:值矩阵",数据矩阵".我假设您的意思是距离矩阵.换句话说,元件D_ij在对称非负的N-通过-N 距离矩阵 d表示两个特征向量,X_I和x_j之间的距离.正确吗?

The question does not define matrix very well: "matrix of values", "matrix of data". I assume that you mean a distance matrix. In other words, element D_ij in the symmetric nonnegative N-by-N distance matrix D denotes the distance between two feature vectors, x_i and x_j. Is that correct?

如果是这样,请尝试以下操作(于2010年6月13日编辑,以反映两个不同的树状图):

If so, then try this (edited June 13, 2010, to reflect two different dendrograms):

import scipy
import pylab
import scipy.cluster.hierarchy as sch
from scipy.spatial.distance import squareform


# Generate random features and distance matrix.
x = scipy.rand(40)
D = scipy.zeros([40,40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

condensedD = squareform(D)

# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.linkage(condensedD, method='centroid')
Z1 = sch.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.linkage(condensedD, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)
fig.show()
fig.savefig('dendrogram.png')

祝你好运!让我知道您是否需要更多帮助.

Good luck! Let me know if you need more help.

对于不同的颜色,请调整imshow中的cmap属性.有关示例,请参见 scipy/matplotlib文档.该页面还描述了如何创建自己的颜色图.为了方便起见,我建议使用一个已经存在的颜色表.在我的示例中,我使用了YlGnBu.

For different colors, adjust the cmap attribute in imshow. See the scipy/matplotlib docs for examples. That page also describes how to create your own colormap. For convenience, I recommend using a preexisting colormap. In my example, I used YlGnBu.

add_axes(参见文档这里)接受列表或元组:(left, bottom, width, height).例如,(0.5,0,0.5,1)在图的右半部分添加Axes. 增加了一个在上半部图中的

add_axes (see documentation here) accepts a list or tuple: (left, bottom, width, height). For example, (0.5,0,0.5,1) adds an Axes on the right half of the figure. (0,0.5,1,0.5) adds an Axes on the top half of the figure.

为方便起见,大多数人可能会使用add_subplot.我喜欢add_axes作为控件.

Most people probably use add_subplot for its convenience. I like add_axes for its control.

要删除边框,请使用add_axes([left,bottom,width,height], frame_on=False). 在此处查看示例.

To remove the border, use add_axes([left,bottom,width,height], frame_on=False). See example here.

这篇关于在python中的数据矩阵的顶部绘制层次聚类的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆