有没有办法在热图中保留聚类但减少观察次数? [英] is there a way to preserve the clustering in a heatmap but reduce the number of observations?

查看:174
本文介绍了有没有办法在热图中保留聚类但减少观察次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有20个列的90个观测值(行)的数据集.我已经生成了一个非常简洁的热图,该热图将我的数据与软件包pheatmap分为两组.尽管它不是很干净,但根据我的条件,两个树状图簇将我的样本分成了两个不同的组.现在,我想将这90个集合减少为20-30个观测的更严格的集合,但仍然希望保留与pheatmap中所示相同的聚类顺序.有没有办法做到这一点?还是任何其他将我的观察结果减少到最小集的软件包,现在仍然可以通过聚类顺序保留这些最小集? pheatmap的代码是

I have data-set with 90 observations(rows) across 20 columns. I have generated a pretty neat heatmap which clusters my data in two groups with the package pheatmap. Although its not entirely clean but the two clusters of dendrogram pretty much separates my samples in 2 distinct groups as per my conditions. Now I want to reduce this set of 90 to a stricter set around 20-30 obeservations but still want to preserve the same clustering order as shown in pheatmap. Is there a way to do that? or any other package that reduces my observations to a minimum set which can still preserve by clustering order as seen now? The code for pheatmap is

pheatmap(mydata[rownames(df.90),],scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 8,fontsize_col = 8,clustering_method = "ward.D2",border_color = NA,)

我遗漏的R中的任何程序包都可以处理pheatmap中的此类或什至某些内容,我可以将其用作减少变量的函数,并进行某种置换测试以找到仍可保留我的集群

any package in R that I am missing out can handle such or even something in the pheatmap I can use as a function for reducing the variables and make a kind of permutation test to find the minimum set of observations that can still retain my clustering

数据是患者中按行表示的基因和按列表示的基因.

The data is genes in rows and expression in columns across patients.

推荐答案

我想回答我自己的问题,并希望获得反馈.我在照片地图中使用了kmeans_k=30并获得了29个聚类,这些聚类仍然能够保留我之前所做的90个观察值的聚类.从那里,我获得了它们各自簇中的基因.我从观察结果的任一侧的热图中选择了前5个聚类,由于它们是具有较高SD的聚类,因此仍可以产生所需的热图.由于在我的所有照片图中,我都具有scale ="row"并保持行树状图和col树状图都处于打开状态,所以即使现在我也不想更改它们.因此,当我现在绘制这31个基因(观测值)时,它们实际上甚至可以改善我的行聚类,并按照我想要的更干净的方式将它们完全分为2个组. kemans和新的热图代码

I would like to answer my own question and want feedback. I used the kmeans_k=30 in the pheatmap and obtained 29 clusters that are still able to preserve my clustering of the 90 observations that I made previously. From there I obtained the genes in their respective clusters. I selected the top 5 clusters from that heatmap on either side of the observations that can still produce my required heatmap since they are the ones having high SD. Since all through my pheatmap I have scale="row" and kept both row dendrogram and col dendrogram on, I did not want to change them even now. So when I now plot this 31 genes(observations) in fact they improve my row clustering even more and totally partitions them in 2 groups in a more cleaner way as I wanted. Codes for kemans and new heatmap

obj<-pheatmap(df.90,scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 6,fontsize_col = 7,clustering_method = "ward.D2",border_color = NA,cellwidth = NA,cellheight = NA,kmeans_k = 30)

检索聚类并提取观测值/基因

obj$kmeans$cluster

获取顶部簇并使用热图对其进行绘制

pheatmap(mydata[rownames(df.31),],scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 8,fontsize_col = 8,clustering_method = "ward.D2",border_color = NA,)

你们对这种方法有何看法?它不像我想要的那样,但我认为也没有错.如果有人可以提供更好的方法或方法,或者他们认为这也不正确,我希望获得反馈.谢谢

What you guys think of this approach? It is not like the one I intended but it is also not wrong I think. I would like to have feedback if someone can give a better method or approach or if they think it is also not correct. Thanks

这篇关于有没有办法在热图中保留聚类但减少观察次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆