使用pheatmap按行注释对数据进行排序? [英] Using pheatmap to sort data by row annotations?

查看:975
本文介绍了使用pheatmap按行注释对数据进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个热图,其中包含测试数据的列和各个研究参与者的行.参与者可以分为三个不同的组.我想用三组注释该图,然后将每组中的数据聚类以了解它们之间的差异.

I'm trying to create a heatmap with columns of test data and rows of individual study participants. The participants can be classified into three distinct groups. I'd like to annotate the plot with the three groups and then cluster the data within each group to understand the differences between them.

我是创建热图的新手,但是我无法使行注释起作用.我也不确定一旦注释生效,如何仅在每个组中进行聚类.我当时在想软件包"pheatmap.type"可以工作,但不幸的是,它不适用于R版本4.0.2.

I'm new to creating heatmaps, and I can't get the row annotations to work. I'm also not sure how to cluster only within each group once I do get the annotations working. I was thinking that the package "pheatmap.type" would work, but unfortunately, it's not available for R version 4.0.2.

我无法发布确切的数据(机密信息),但是我已经附加了示例文件,我将描述到目前为止所做的工作并发布代码.我有一个数据框,第一列为标签,其中包括参与者ID和组(使用row.names = 1进行了此设置),然后是12列,包含数字数据(无NA).然后,我按行名对数据进行排序,并使用scale函数缩放数据并生成矩阵.然后,我尝试通过以几种不同的方式将组信息添加到数据框中来创建注释行.到目前为止,我已经尝试过以下操作:

I can't post exact data (confidential) but I've attached and example file and I'll describe what I've done so far and post the code. I have a data frame with the first column as labels that include the participant ID and the group (did this using row.names=1) and then 12 columns with numeric data (no NA's). I then ordered the data by the row names and used the scale function to scale the data and generate a matrix. I then tried to create an annotation row by adding the group info to a data frame in several different ways. What I've tried so far is below:

#dataframe with Group and ID as row names and 12 numerical columns  

df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)

#ordering the dataframe so that the groups are in order 
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]

#Z-scoring (scaling) data 
df_HM_matrix_1 <- scale(df_1_HM)

#creating a color palette 
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)


#Plotting heatmap 
install.packages("gplots")
library(gplots)

#trying to plot the heatmap with annotation_row data 
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.

pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=annotation_row)

annotation_row = data.frame(
  df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)

rownames(annotation_row) = paste("df_Group", 1:28, sep = "")

rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching

#I also tried to use a dataframe with just the groups as column 1 to get row annotation 
pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=df_Group)

df_Group <- data.frame(df_1$Group, df_1$ID)

#Also tried using the select function to create a dataframe for the row annotation 
df_Group_1 <- select(df_1, Group) 

#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric

任何对此的帮助都将非常棒!

Any help with this at all would be awesome!!

以下是示例数据:

structure(list(Group_ID = structure(1:28, .Label = c("Group1_10", 
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26", 
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1", 
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23", 
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11", 
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4", 
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36, 
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67, 
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44, 
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59, 
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31), 
    Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    )), class = "data.frame", row.names = c(NA, -28L))

推荐答案

要使注释与pheatmap配合使用,必须对因子进行排序.为此,将ordered = TRUE添加到factor():

For annotations to work with pheatmap, factors must be ordered. To do this, add ordered = TRUE to factor():

annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))

您还可以使用as.ordered()完成同一件事.

You could also use as.ordered() to accomplish the same thing.

要按注释组对热图行进行排序,只需将参数cluster_rows = F添加到pheatmap():

To sort your heatmap row by annotation group, just add the argument cluster_rows = F to pheatmap():

pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=annotation_row,
         cluster_rows = F)

这是现在的样子:

这篇关于使用pheatmap按行注释对数据进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆