在python中的数据矩阵上绘制层次聚类的结果 [英] plotting results of hierarchical clustering ontop of a matrix of data in python

查看:31
本文介绍了在python中的数据矩阵上绘制层次聚类的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Python 中正确重新排序以反映聚类的值矩阵顶部绘制树状图?一个例子如下图:

祝你好运!如果您需要更多帮助,请告诉我.

<小时>

Edit:对于不同的颜色,调整imshow中的cmap属性.有关示例,请参阅 scipy/matplotlib 文档.该页面还介绍了如何创建自己的颜色图.为方便起见,我建议使用预先存在的颜色图.在我的示例中,我使用了 YlGnBu.

<小时>

add_axes(在此处查看文档) 接受列表或元组:(left, bottom, width, height).例如,(0.5,0,0.5,1) 在图的右半部分添加了一个 Axes.(0,0.5,1,0.5) 在图的上半部分添加一个Axes.

大多数人可能使用 add_subplot 是为了方便.我喜欢 add_axes 的控制.

要移除边框,请使用 add_axes([left,bottom,width,height], frame_on=False).在此处查看示例.

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is the following figure:

https://publishing-cdn.elifesciences.org/07103/elife-07103-fig6-figsupp1-v2.jpg

I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. How can I then plot the data as a matrix where the rows have been reordered to reflect a clustering induced by the cutting the dendrogram at a particular threshold, and have the dendrogram plotted alongside the matrix? I know how to plot the dendrogram in scipy, but not how to plot the intensity matrix of data with the right scale bar next to it.

Any help on this would be greatly appreciated.

解决方案

The question does not define matrix very well: "matrix of values", "matrix of data". I assume that you mean a distance matrix. In other words, element D_ij in the symmetric nonnegative N-by-N distance matrix D denotes the distance between two feature vectors, x_i and x_j. Is that correct?

If so, then try this (edited June 13, 2010, to reflect two different dendrograms):

import scipy
import pylab
import scipy.cluster.hierarchy as sch
from scipy.spatial.distance import squareform


# Generate random features and distance matrix.
x = scipy.rand(40)
D = scipy.zeros([40,40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

condensedD = squareform(D)

# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.linkage(condensedD, method='centroid')
Z1 = sch.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.linkage(condensedD, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)
fig.show()
fig.savefig('dendrogram.png')

Good luck! Let me know if you need more help.


Edit: For different colors, adjust the cmap attribute in imshow. See the scipy/matplotlib docs for examples. That page also describes how to create your own colormap. For convenience, I recommend using a preexisting colormap. In my example, I used YlGnBu.


Edit: add_axes (see documentation here) accepts a list or tuple: (left, bottom, width, height). For example, (0.5,0,0.5,1) adds an Axes on the right half of the figure. (0,0.5,1,0.5) adds an Axes on the top half of the figure.

Most people probably use add_subplot for its convenience. I like add_axes for its control.

To remove the border, use add_axes([left,bottom,width,height], frame_on=False). See example here.

这篇关于在python中的数据矩阵上绘制层次聚类的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆