使用pheatmap按行注释对数据进行排序? [英] Using pheatmap to sort data by row annotations?
问题描述
我正在尝试创建一个热图,其中包含测试数据的列和各个研究参与者的行.参与者可以分为三个不同的组.我想用三组注释该图,然后将每组中的数据聚类以了解它们之间的差异.
I'm trying to create a heatmap with columns of test data and rows of individual study participants. The participants can be classified into three distinct groups. I'd like to annotate the plot with the three groups and then cluster the data within each group to understand the differences between them.
我是创建热图的新手,但是我无法使行注释起作用.我也不确定一旦注释生效,如何仅在每个组中进行聚类.我当时在想软件包"pheatmap.type"可以工作,但不幸的是,它不适用于R版本4.0.2.
I'm new to creating heatmaps, and I can't get the row annotations to work. I'm also not sure how to cluster only within each group once I do get the annotations working. I was thinking that the package "pheatmap.type" would work, but unfortunately, it's not available for R version 4.0.2.
我无法发布确切的数据(机密信息),但是我已经附加了示例文件,我将描述到目前为止所做的工作并发布代码.我有一个数据框,第一列为标签,其中包括参与者ID和组(使用row.names = 1进行了此设置),然后是12列,包含数字数据(无NA).然后,我按行名对数据进行排序,并使用scale函数缩放数据并生成矩阵.然后,我尝试通过以几种不同的方式将组信息添加到数据框中来创建注释行.到目前为止,我已经尝试过以下操作:
I can't post exact data (confidential) but I've attached and example file and I'll describe what I've done so far and post the code. I have a data frame with the first column as labels that include the participant ID and the group (did this using row.names=1) and then 12 columns with numeric data (no NA's). I then ordered the data by the row names and used the scale function to scale the data and generate a matrix. I then tried to create an annotation row by adding the group info to a data frame in several different ways. What I've tried so far is below:
#dataframe with Group and ID as row names and 12 numerical columns
df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)
#ordering the dataframe so that the groups are in order
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]
#Z-scoring (scaling) data
df_HM_matrix_1 <- scale(df_1_HM)
#creating a color palette
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)
#Plotting heatmap
install.packages("gplots")
library(gplots)
#trying to plot the heatmap with annotation_row data
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row)
annotation_row = data.frame(
df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)
rownames(annotation_row) = paste("df_Group", 1:28, sep = "")
rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching
#I also tried to use a dataframe with just the groups as column 1 to get row annotation
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=df_Group)
df_Group <- data.frame(df_1$Group, df_1$ID)
#Also tried using the select function to create a dataframe for the row annotation
df_Group_1 <- select(df_1, Group)
#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric
任何对此的帮助都将非常棒!
Any help with this at all would be awesome!!
以下是示例数据:
structure(list(Group_ID = structure(1:28, .Label = c("Group1_10",
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26",
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1",
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23",
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11",
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4",
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36,
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13,
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49,
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67,
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26,
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49,
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92,
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44,
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56,
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93,
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59,
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12,
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58,
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42,
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31),
Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
)), class = "data.frame", row.names = c(NA, -28L))
推荐答案
要使注释与pheatmap
配合使用,必须对因子进行排序.为此,将ordered = TRUE
添加到factor()
:
For annotations to work with pheatmap
, factors must be ordered. To do this, add ordered = TRUE
to factor()
:
annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))
您还可以使用as.ordered()
完成同一件事.
You could also use as.ordered()
to accomplish the same thing.
要按注释组对热图行进行排序,只需将参数cluster_rows = F
添加到pheatmap()
:
To sort your heatmap row by annotation group, just add the argument cluster_rows = F
to pheatmap()
:
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row,
cluster_rows = F)
这是现在的样子:
这篇关于使用pheatmap按行注释对数据进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!