连接树状图和热图 [英] Joining a dendrogram and a heatmap
问题描述
我有一个heatmap
(一组样本的基因表达):
I have a heatmap
(gene expression from a set of samples):
set.seed(10)
mat <- matrix(rnorm(24*10,mean=1,sd=2),nrow=24,ncol=10,dimnames=list(paste("g",1:24,sep=""),paste("sample",1:10,sep="")))
dend <- as.dendrogram(hclust(dist(mat)))
row.ord <- order.dendrogram(dend)
mat <- matrix(mat[row.ord,],nrow=24,ncol=10,dimnames=list(rownames(mat)[row.ord],colnames(mat)))
mat.df <- reshape2::melt(mat,value.name="expr",varnames=c("gene","sample"))
require(ggplot2)
map1.plot <- ggplot(mat.df,aes(x=sample,y=gene))+geom_tile(aes(fill=expr))+scale_fill_gradient2("expr",high="darkred",low="darkblue")+scale_y_discrete(position="right")+
theme_bw()+theme(plot.margin=unit(c(1,1,1,-1),"cm"),legend.key=element_blank(),legend.position="right",axis.text.y=element_blank(),axis.ticks.y=element_blank(),panel.border=element_blank(),strip.background=element_blank(),axis.text.x=element_text(angle=45,hjust=1,vjust=1),legend.text=element_text(size=5),legend.title=element_text(size=8),legend.key.size=unit(0.4,"cm"))
(由于我正在使用plot.margin
自变量,所以左侧被切断,但下面显示的内容我需要使用它.)
(The left side gets cut off because of the plot.margin
arguments I'm using but I need this for what's shown below).
然后我根据深度截止值prune
行dendrogram
以获得更少的聚类(即,仅深裂),并对所得的dendrogram
进行一些编辑,使其按照我想要的方式绘制:
Then I prune
the row dendrogram
according to a depth cutoff value to get fewer clusters (i.e., only deep splits), and do some editing on the resulting dendrogram
to have it plotted they way I want it:
depth.cutoff <- 11
dend <- cut(dend,h=depth.cutoff)$upper
require(dendextend)
gg.dend <- as.ggdend(dend)
leaf.heights <- dplyr::filter(gg.dend$nodes,!is.na(leaf))$height
leaf.seqments.idx <- which(gg.dend$segments$yend %in% leaf.heights)
gg.dend$segments$yend[leaf.seqments.idx] <- max(gg.dend$segments$yend[leaf.seqments.idx])
gg.dend$segments$col[leaf.seqments.idx] <- "black"
gg.dend$labels$label <- 1:nrow(gg.dend$labels)
gg.dend$labels$y <- max(gg.dend$segments$yend[leaf.seqments.idx])
gg.dend$labels$x <- gg.dend$segments$x[leaf.seqments.idx]
gg.dend$labels$col <- "black"
dend1.plot <- ggplot(gg.dend,labels=F)+scale_y_reverse()+coord_flip()+theme(plot.margin=unit(c(1,-3,1,1),"cm"))+annotate("text",size=5,hjust=0,x=gg.dend$label$x,y=gg.dend$label$y,label=gg.dend$label$label,colour=gg.dend$label$col)
然后使用cowplot
的plot_grid
将它们绘制在一起:
And I plot them together using cowplot
's plot_grid
:
require(cowplot)
plot_grid(dend1.plot,map1.plot,align='h',rel_widths=c(0.5,1))
尽管align='h'
可以正常工作,但这并不完美.
Although the align='h'
is working it is not perfect.
使用plot_grid
用map1.plot
绘制未切割的dendrogram
可以说明这一点:
Plotting the un-cut dendrogram
with map1.plot
using plot_grid
illustrates this:
dend0.plot <- ggplot(as.ggdend(dend))+scale_y_reverse()+coord_flip()+theme(plot.margin=unit(c(1,-1,1,1),"cm"))
plot_grid(dend0.plot,map1.plot,align='h',rel_widths=c(1,1))
dendrogram
顶部和底部的分支似乎朝中心挤压.使用scale
似乎是一种调整方法,但是比例值似乎是特定于图形的,因此我想知道是否有任何方法可以更原则地进行此操作.
The branches at the top and bottom of the dendrogram
seem to be squished towards the center. Playing around with the scale
seems to be a way of adjusting it, but the scale values seem to be figure-specific so I'm wondering if there's any way to do this in a more principled way.
接下来,我对heatmap
的每个群集进行一些术语丰富化分析.假设此分析为我提供了data.frame
:
Next, I do some term enrichment analysis on each cluster of my heatmap
. Suppose this analysis gave me this data.frame
:
enrichment.df <- data.frame(term=rep(paste("t",1:10,sep=""),nrow(gg.dend$labels)),
cluster=c(sapply(1:nrow(gg.dend$labels),function(i) rep(i,5))),
score=rgamma(10*nrow(gg.dend$labels),0.2,0.7),
stringsAsFactors = F)
我想做的是将此data.frame
绘制为heatmap
,并将剪切dendrogram
放置在其下方(类似于将其放置在表达式heatmap
左侧的方式).
What I'd like to do is plot this data.frame
as a heatmap
and place the cut dendrogram
below it (similar to how it's placed to the left of the expression heatmap
).
所以我再次尝试align='v'
在这里可以工作:
So I tried plot_grid
again thinking that align='v'
would work here:
首先重新生成面朝上的树状图:
First regenerate the dendrogram plot having it facing up:
dend2.plot <- ggplot(gg.dend,labels=F)+scale_y_reverse()+theme(plot.margin=unit(c(-3,1,1,1),"cm"))
现在尝试将它们绘制在一起:
Now trying to plot them together:
plot_grid(map2.plot,dend2.plot,align='v')
plot_grid
似乎无法将它们对齐,如图所示及其引发的警告消息:
plot_grid
doesn't seem to be able to align them as the figure shows and the warning message it throws:
In align_plots(plotlist = plots, align = align) :
Graphs cannot be vertically aligned. Placing graphs unaligned.
这似乎很接近:
plot_grid(map2.plot,dend2.plot,rel_heights=c(1,0.5),nrow=2,ncol=1,scale=c(1,0.675))
这是在玩弄了scale
参数之后实现的,尽管绘图太宽了.再次,我想知道是否有解决方法,或者以某种方式预先确定了dendrogram
和heatmap
的任何给定列表正确的scale
是什么,也许是通过它们的尺寸来确定的.
This is achieved after playing around with the scale
parameter, although the plot comes out too wide. So again, I'm wondering if there's a way around it or somehow predetermine what is the correct scale
for any given list of a dendrogram
and heatmap
, perhaps by their dimensions.
推荐答案
前段时间,我遇到了几乎相同的问题.根据树状图的结果,我使用的基本技巧是直接指定基因的位置.为了简单起见,首先是绘制完整的树状图的情况:
I faced pretty much the same issue some time ago. The basic trick I used was to specify directly the positions of the genes, given the results of the dendrogram. For the sake of simplicity, here is first the the case of plotting the full dendrogram:
# For the full dendrogram
library(plyr)
library(reshape2)
library(dplyr)
library(ggplot2)
library(ggdendro)
library(gridExtra)
library(dendextend)
set.seed(10)
# The source data
mat <- matrix(rnorm(24 * 10, mean = 1, sd = 2),
nrow = 24, ncol = 10,
dimnames = list(paste("g", 1:24, sep = ""),
paste("sample", 1:10, sep = "")))
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
# Heatmap plot
plt_hmap <- ggplot(heatmap_data,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
scale_fill_gradient2("expr", high = "darkred", low = "darkblue") +
scale_x_continuous(breaks = sample_pos_table$x_center,
labels = sample_pos_table$sample,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "Sample", y = "") +
theme_bw() +
theme(axis.text.x = element_text(size = rel(1), hjust = 1, angle = 45),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.2, 0.2, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "Distance", y = "", colour = "", size = "") +
theme_bw() +
theme(panel.grid.minor = element_blank())
library(cowplot)
plot_grid(plt_dendr, plt_hmap, align = 'h', rel_widths = c(1, 1))
请注意,我将y轴刻度线保留在热图图中的左侧,只是为了显示树状图和刻度线完全匹配.
Note that I kept the y axis ticks in the left in the heatmap plot, just to show that the dendrogram and ticks match exactly.
现在,对于切割后的树状图,应该记住,树状图的叶子将不再终止于与给定簇中某个基因相对应的确切位置.为了获得基因和簇的位置,需要从两个完整的树状图中提取数据,这是通过切割完整的一个树状图得到的.总体而言,为了阐明聚类中的基因,我添加了界定聚类的矩形.
Now, for the case of the cut dendrogram, one should keep in mind that the leafs of the dendrogram will no longer end in the exact position corresponding to a gene in a given cluster. To obtain the positions of the genes and the clusters, one needs to extract the data out of the two dendrograms that result from cutting the full one. Overall, to clarify the genes in the clusters, I added rectangles that delimit the clusters.
# For the cut dendrogram
library(plyr)
library(reshape2)
library(dplyr)
library(ggplot2)
library(ggdendro)
library(gridExtra)
library(dendextend)
set.seed(10)
# The source data
mat <- matrix(rnorm(24 * 10, mean = 1, sd = 2),
nrow = 24, ncol = 10,
dimnames = list(paste("g", 1:24, sep = ""),
paste("sample", 1:10, sep = "")))
sample_names <- colnames(mat)
# Obtain the dendrogram
full_dend <- as.dendrogram(hclust(dist(mat)))
# Cut the dendrogram
depth_cutoff <- 11
h_c_cut <- cut(full_dend, h = depth_cutoff)
dend_cut <- as.dendrogram(h_c_cut$upper)
dend_cut <- hang.dendrogram(dend_cut)
# Format to extend the branches (optional)
dend_cut <- hang.dendrogram(dend_cut, hang = -1)
dend_data_cut <- dendro_data(dend_cut)
# Extract the names assigned to the clusters (e.g., "Branch 1", "Branch 2", ...)
cluster_names <- as.character(dend_data_cut$labels$label)
# Extract the names of the haplotypes that belong to each group (using
# the 'labels' function)
lst_genes_in_clusters <- h_c_cut$lower %>%
lapply(labels) %>%
setNames(cluster_names)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data_cut),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Extract the positions of the clusters (by getting the positions of the
# leafs); data is already in the same order as in the cluster name
cluster_positions <- segment_data[segment_data$xend == 0, "y"]
cluster_pos_table <- data.frame(y_position = cluster_positions,
cluster = cluster_names)
# Specify the positions for the genes, accounting for the clusters
gene_pos_table <- lst_genes_in_clusters %>%
ldply(function(ss) data.frame(gene = ss), .id = "cluster") %>%
mutate(y_center = 1:nrow(.),
height = 1)
# > head(gene_pos_table, 3)
# cluster gene y_center height
# 1 Branch 1 g11 1 1
# 2 Branch 1 g20 2 1
# 3 Branch 1 g12 3 1
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
mutate(x_center = 1:nrow(.),
width = 1)
# Coordinates for plotting rectangles delimiting the clusters: aggregate
# over the positions of the genes in each cluster
cluster_delim_table <- gene_pos_table %>%
group_by(cluster) %>%
summarize(y_min = min(y_center - 0.5 * height),
y_max = max(y_center + 0.5 * height)) %>%
as.data.frame() %>%
mutate(x_min = with(sample_pos_table, min(x_center - 0.5 * width)),
x_max = with(sample_pos_table, max(x_center + 0.5 * width)))
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# Limits for the vertical axes (genes / clusters)
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
# Heatmap plot
plt_hmap <- ggplot(heatmap_data,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
geom_rect(data = cluster_delim_table,
aes(xmin = x_min, xmax = x_max, ymin = y_min, ymax = y_max),
fill = NA, colour = "black", inherit.aes = FALSE) +
scale_fill_gradient2("expr", high = "darkred", low = "darkblue") +
scale_x_continuous(breaks = sample_pos_table$x_center,
labels = sample_pos_table$sample,
expand = c(0.01, 0.01)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0),
position = "right") +
labs(x = "Sample", y = "") +
theme_bw() +
theme(axis.text.x = element_text(size = rel(1), hjust = 1, angle = 45),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.2, 0.2, -0.1), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = cluster_pos_table$y_position,
labels = cluster_pos_table$cluster,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "Distance", y = "", colour = "", size = "") +
theme_bw() +
theme(panel.grid.minor = element_blank())
library(cowplot)
plot_grid(plt_dendr, plt_hmap, align = 'h', rel_widths = c(1, 1.8))
这篇关于连接树状图和热图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!