如何根据定义的组为树状图的标签着色? (在R中) [英] How to color a dendrogram's labels according to defined groups? (in R)

查看:393
本文介绍了如何根据定义的组为树状图的标签着色? (在R中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数字矩阵,具有24行和10,000列。该矩阵的行名基本上是文件名,我从中读取了与24行中的每一行相对应的数据。除此之外,我还有一个单独的因子列表,包含24个整体,指定了24个文件所属的组。有3类-酒精,烃和酯。名称和它们所属的相应组看起来像这样:

 > MS.mz 
[1] int-354.19 int-361.35 int-368.35 int-396.38 int-408.41 int-410.43 int-422.43
[ 8] int-424.42 int-436.44 int-438.46 int-452.00 int-480.48 int-648.64 int-312.14
[15] int-676.68 int-690.62 int-704.75 int-312.29 int-326.09 int-326.18 int-326.31
[22] int-340.21 int-340.32 int-352.35

> MS.groups
[1]酒精酒精酒精酒精烃酒精烃酒精醇
[9]烃酒精醇酯醇酯酯
[17]酯醇醇醇醇醇醇醇烃
级别:醇酯烃

我想生成树状图以查看矩阵中的数据如何可以集群。因此,我使用了以下命令:

  require(vegan)
dist.mat< -vegdist(MS.data .scaled.transposed,method = euclidean)
clust.res< -hclust(dist.mat)
情节(clust.res)

我得到了一个树状图。现在我想根据树状图中文件名所属的组为它们着色,即酒精,烃或酯。我查看了论坛上发布的其他示例,例如







通过引导进行群集



,但无法为我的数据实现它。我不确定如何将row.names与MS.groups相关联以获取树状图中的彩色名称。



使用dendextend生成树时(如 https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf ),我得到了以下树





以下是用于生成它的代码:

  require(colorspace)
d_SIMS <- dist(firstpointsample5 [,-1])$ ​​b $ b hc_SIMS<-hclust(d_SIMS)
标签(hc_SIMS)
dend_SIMS<-as.dendrogram(hc_SIMS)
SIMS_groups< -rev(levels(firstpointsample5 [,1]))
dend_SIMS<-color_branches(dend_SIMS,k = 3,groupLabels = SIMS_groups)
is.character(labels(labels(dend_SIMS)))
图(dend_SIMS)
labels_colors(dend_SIMS)<-rainbow_hcl(3)[sort_levels_values(as.numeric(firstpointsample5 [,1])[order.dendrogram(dend_SIMS)])]]
标签(dend_ SIMS)<-粘贴(as.character(firstpointsample5 [,1])[order.dendrogram(dend_SIMS)],(,标签(dend_SIMS),),sep =)
dend_SIMS< ;-hang.dendrogram(dend_SIMS,hang_height = 0.1)
dend_SIMS< -assign_values_to_leaves_nodePar(dend_SIMS,0.5, lab.cex)
par(mar = c(3,3,3,7) )
图(dend_SIMS,main =集群的SIMS数据集\n(标签给出了真实的m / z组),horiz = TRUE,nodePar = list(cex = 0.007))
图例( topleft,图例= SIMS_groups,填充= rainbow_hcl(3))


解决方案

我怀疑您要查找的功能是 color_labels get_leaves_branches_col 。标签的第一种颜色是基于 cutree 的颜色(例如 color_branches 的颜色),第二种颜色则可以获取标签的颜色每个叶子的分支,然后使用它为树的标签着色(如果您使用不寻常的方法为分支着色(如使用 branches_attr_by_labels 时所发生的情况)。例如:

 #定义要使用的树状图对象:
hc<-hclust(dist(USArrests [1:5, ]), ave)
dend<-as.dendrogram(hc)

库(dendextend)
par(mfrow = c(1,2),mar = c(5,2,1,0))
dend--dend%>%
color_branches(k = 3)%&%;%
set( branches_lwd,c( 2,1,2))%>%
set( branches_lty,c(1,2,1))

地块(dend)

dend<-color_labels(dend,k = 3)
#与以下相同:
#labels_colors(dend)<-get_leaves_branches_col(dend)
图(dend)



无论哪种方式,您都应该始终查看 set 函数,以获取有关以下内容的想法:可以对树状图执行什么操作(这省去了记住所有不同函数名称的麻烦)。


I have a numeric matrix in R with 24 rows and 10,000 columns. The row names of this matrix are basically file names from which I have read the data corresponding to each of the 24 rows. Apart from this I have a separate factor list with 24 entires, specifying the group to which the 24 files belong. There are 3 groups - Alcohols, Hydrocarbon and Ester. The names and the corresponding group to which they belong look like this:

> MS.mz
[1] "int-354.19" "int-361.35" "int-368.35" "int-396.38" "int-408.41" "int-410.43" "int-422.43"
[8] "int-424.42" "int-436.44" "int-438.46" "int-452.00" "int-480.48" "int-648.64" "int-312.14"
[15] "int-676.68" "int-690.62" "int-704.75" "int-312.29" "int-326.09" "int-326.18" "int-326.31"
[22] "int-340.21" "int-340.32" "int-352.35"

> MS.groups
[1] Alcohol     Alcohol     Alcohol     Alcohol     Hydrocarbon Alcohol     Hydrocarbon Alcohol    
[9] Hydrocarbon Alcohol     Alcohol     Alcohol     Ester       Alcohol     Ester       Ester      
[17] Ester       Alcohol     Alcohol     Alcohol     Alcohol     Alcohol     Alcohol     Hydrocarbon
Levels: Alcohol Ester Hydrocarbon

I wanted to generate a dendrogram to look how the data in the matrix can be clustered. So, I used the following commands:

require(vegan)
dist.mat<-vegdist(MS.data.scaled.transposed,method="euclidean")
clust.res<-hclust(dist.mat)
plot(clust.res)

and I got a dendogram. Now I want to color the file names in the dendrogram according to the group they belong to i.e Alcohol, Hydrocarbon or Ester. I looked at different examples posted on the forum like

Label and color leaf dendrogram in r

Label and color leaf dendrogram in R using ape package

Clustering with bootstrapping

, but could not implement it for my data. I am not sure how to correlate row.names with the MS.groups to get the colored names in the dendrogram.

On generating the tree using dendextend (as explained in https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf), I get the following tree

Here is the code used to generate it:

require(colorspace)
d_SIMS <- dist(firstpointsample5[,-1])
hc_SIMS <- hclust(d_SIMS)
labels(hc_SIMS)
dend_SIMS <- as.dendrogram(hc_SIMS)
SIMS_groups <- rev(levels(firstpointsample5[, 1]))
dend_SIMS <- color_branches(dend_SIMS, k = 3, groupLabels = SIMS_groups)
is.character(labels(dend_SIMS)) 
plot(dend_SIMS)
labels_colors(dend_SIMS) <- rainbow_hcl(3)[sort_levels_values(as.numeric(firstpointsample5[,1])[order.dendrogram(dend_SIMS)])]
labels(dend_SIMS) <- paste(as.character(firstpointsample5[, 1])[order.dendrogram(dend_SIMS)],"(", labels(dend_SIMS), ")", sep = "")
dend_SIMS <- hang.dendrogram(dend_SIMS, hang_height = 0.1)
dend_SIMS <- assign_values_to_leaves_nodePar(dend_SIMS, 0.5,"lab.cex")
par(mar = c(3, 3, 3, 7))
plot(dend_SIMS, main = "Clustered SIMS dataset\n (the labels give the true m/z groups)",horiz = TRUE, nodePar = list(cex = 0.007))
legend("topleft", legend = SIMS_groups, fill = rainbow_hcl(3))

解决方案

I suspect the function you are looking for is either color_labels or get_leaves_branches_col. The first color your labels based on cutree (like color_branches do) and the second allows you to get the colors of the branch of each leaf, and then use it to color the labels of the tree (if you use unusual methods for coloring the branches (as happens when using branches_attr_by_labels). For example:

# define dendrogram object to play with:
hc <- hclust(dist(USArrests[1:5,]), "ave")
dend <- as.dendrogram(hc)

library(dendextend)
par(mfrow = c(1,2), mar = c(5,2,1,0))
dend <- dend %>%
         color_branches(k = 3) %>%
         set("branches_lwd", c(2,1,2)) %>%
         set("branches_lty", c(1,2,1))

plot(dend)

dend <- color_labels(dend, k = 3)
# The same as:
# labels_colors(dend)  <- get_leaves_branches_col(dend)
plot(dend)

Either way, you should always have a look at the set function, for ideas on what can be done to your dendrogram (this saves the hassle of remembering all the different functions names).

这篇关于如何根据定义的组为树状图的标签着色? (在R中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆