R:如何根据他们的小组为箱形图上的样本着色? [英] R: How to colour samples on a boxplot by their group?

查看:56
本文介绍了R:如何根据他们的小组为箱形图上的样本着色?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在矩阵中有基因表达数据,矩阵中按样本排列,行中按基因排列.我约有30,000个基因的300个样本.

数据的前三行是这样的:

  Sample1 Sample2 Sample3 Sample4 Sample5基因1 6.53845 6.38723 6.41613 6.07901 6.45148基因2 6.34303 6.52751 6.48025 6.79185 6.94955基因3 6.17286 6.31772 6.44266 6.61777 7.05509…… 

依此类推,最多可显示30,000行和300个样本.

我已经能够使用R绘制数据的箱线图,但是现在我希望根据样品的批次/组为箱线图着色.

我有一张这样的批处理信息表.

 样品批次样品1样品2样品3 B样品4样品5 C…… 

以此类推,共8个批次.使用R,我应该如何根据样品所属的批次为箱线图着色?谢谢!

解决方案

其中一种方法可能是

  library(dplyr)图书馆(tidyr)图书馆(小标题)库(ggplot2)df%>%rownames_to_column("Genes")%>%#将行名添加为列collect(Sample,Sample_value,-Genes)%>%#将数据从宽格式转换为长格式以进行绘图left_join(batch_lookup,by ="Sample")%>%#将其与查找表一起添加以添加批处理"列ggplot(aes(x = Sample,y = Sample_value,color = Batch))+ #plot数据geom_boxplot() 

哪个情节

样本数据:

  df<-结构(list(Sample1 = c(6.53845,6.34303,6.17286),Sample2 = c(6.38723,6.52751,6.31772),Sample3 = c(6.41613,6.48025,6.44266),Sample4 = c(6.07901,6.79185,6.61777),Sample5 = c(6.45148,6.94955,7.05509)),.Names = c("Sample1","Sample2","Sample3","Sample4","Sample5"),类="data.frame",row.names = c("Gene1","Gene2","Gene3"))batch_lookup<-structure(list(Sample = c("Sample1","Sample2","Sample3","Sample4","Sample5"),Batch = c("A","A","B","A","C"))).Names = c("Sample",批"),类="data.frame",row.names = c(NA,-5L)) 

I currently have gene expression data in a matrix, arranged by samples in columns, and genes in rows. I have about 300 samples against 30,000 genes.

There first three lines of the data is as such:

         Sample1  Sample2  Sample3   Sample4   Sample5
Gene1    6.53845  6.38723  6.41613   6.07901   6.45148
Gene2    6.34303  6.52751  6.48025   6.79185   6.94955
Gene3    6.17286  6.31772  6.44266   6.61777   7.05509
...      ...    

And so on for up to 30,000 rows, and 300 samples.

I have been able to plot a boxplot of the data using R, but I am now looking to colour the boxplot based on the batches/groups of the sample.

I have a table of the batch information as such.

Sample   Batch
Sample1  A
Sample2  A
Sample3  B
Sample4  A
Sample5  C
...      ...

And so on for 8 batches. Using R, how should I go about colouring the boxplot based on which batch the sample belongs to? Thanks!

解决方案

One of the approach could be

library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)

df %>%
  rownames_to_column("Genes") %>%                          #add rownames as column
  gather(Sample, Sample_value, -Genes) %>%                 #convert data to long format from wide format for plotting
  left_join(batch_lookup, by = "Sample") %>%               #join it with lookup table to add 'Batch' column
  ggplot(aes(x=Sample, y=Sample_value, color=Batch)) +     #plot data
    geom_boxplot()

which plots

Sample data:

df <- structure(list(Sample1 = c(6.53845, 6.34303, 6.17286), Sample2 = c(6.38723, 
6.52751, 6.31772), Sample3 = c(6.41613, 6.48025, 6.44266), Sample4 = c(6.07901, 
6.79185, 6.61777), Sample5 = c(6.45148, 6.94955, 7.05509)), .Names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame", row.names = c("Gene1", 
"Gene2", "Gene3"))

batch_lookup <- structure(list(Sample = c("Sample1", "Sample2", "Sample3", "Sample4", 
"Sample5"), Batch = c("A", "A", "B", "A", "C")), .Names = c("Sample", 
"Batch"), class = "data.frame", row.names = c(NA, -5L))

这篇关于R:如何根据他们的小组为箱形图上的样本着色?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆