使用R中的分层聚类生成描述数据集中聚类的热图 [英] Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

查看:264
本文介绍了使用R中的分层聚类生成描述数据集中聚类的热图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取由蛋白质dna相互作用组成的数据集,对数据进行聚类,并生成一个热图,该热图显示结果数据,从而使数据看起来与集群在对角线上排列。我能够对数据进行聚类并生成该数据的树状图,但是当我使用R中的热图函数生成数据的热图时,聚类不可见。如果您查看前两个图像,一个是我能够生成的树状图,第二个是我能够生成的热图,而第三个只是一个聚类热图的示例,它显示了我对结果的期望粗略地看。从比较第二张和第三张图像可以看出,很明显,第三张图像中有聚类,而第二张图像中没有。



这里是指向我的数据集的链接:
http://pastebin.com/wQ9tYmjy



我能够对数据进行聚类并在R中生成良好的结果:


args< ;-commandArgs(TRUE);



matrix_a<-read.table(args [1],sep = '\t',header = T,row.names = 1);



位置<-args [2];



matrix_d<-dist(matrix_a);



hc<-hclust(matrix_d, average);



mypng<-函数(filename = mydefault.png){



png(文件名)



}



选项(设备= mypng)



plot(hc);


我也能够生成一个热图:


matrix_a<-read.table( Arda_list .txt.binary.matrix.txt,sep ='\t',header = T,row.names = 1);



mtscaled<-as.matrix(scale(matrix_a))



heatmap( mtscaled,Colv = F,scale ='none')


我尝试关注该帖子:
http://digitheadslabnotebook.blogspot.com/2011 Christopher Bare制作的/06/drawing-heatmaps-in-r.html
,但我错过了一些东西。任何想法,将不胜感激。我已经附上了我得到的热图以及树状图的图像。图片3摘自克里斯托弗·巴雷(Christopher Bare)的帖子。谢谢

解决方案



事实证明,我应该首先对数据使用某种相关性来生成距离矩阵。我使用了皮尔逊(Pearson)在矩阵上计算了相似性值,然后调用了heapmap函数,这使对数据进行聚类变得更加容易。一旦能够生成聚类,就可以使它们在对角线上对齐。以上是现在的结果。我必须更改在数据集上调用热图的方式,以使群集在轴上对齐:

  heatmap(mtscaled, Colv = T,Rowv = T,scale ='none',symm = T)


I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to cluster the data and generate a dendrogram of that data however when I generate the heatmap of the data using the heatmap function in R, the clusters are not visible. If you look at the first 2 images one is of the dendrogram I am able to generate, the second is of the heatmap that I am able to generate, and the third is just an example of a clustered heatmap that shows how I expect the result to look roughly. As you can see from comparing the second and third images, it is clear that there are clusters in the third but not in the second image.

Here is a link to my dataset: http://pastebin.com/wQ9tYmjy

I am able to cluster the data and generate a just fine in R:

args <- commandArgs(TRUE);

matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);

location <- args[2];

matrix_d <- dist(matrix_a);

hc <- hclust(matrix_d,"average");

mypng <- function(filename = "mydefault.png") {

png(filename)

}

options(device = "mypng")

plot(hc);

I am also able to generate a heatmap okay as well:

matrix_a <- read.table("Arda_list.txt.binary.matrix.txt", sep='\t', header=T, row.names=1);

mtscaled <- as.matrix(scale(matrix_a))

heatmap(mtscaled, Colv=F, scale='none')

I tried to follow the post: http://digitheadslabnotebook.blogspot.com/2011/06/drawing-heatmaps-in-r.html by by Christopher Bare but I am missing something. Any ideas would be appreciated. I have attached an image of the heatmap that I am getting, as well as the dendrogram. Image 3 was taken from Christopher Bare's post. Thanks

解决方案

It turns out I should have generated a distance matrix using some kind of correlation on my data first. I calculated similarity values on the matrix using pearson, then called the heapmap function which made it easier to cluster the data. Once I was able to generate clusters I made it so that they would line up on the diagonal. Above is what the result looks like now. I had to alter how I called heatmap on my data set so that the clusters line up on the axis:

heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)

这篇关于使用R中的分层聚类生成描述数据集中聚类的热图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆