R:如何根据他们的小组为箱形图上的样本着色? [英] R: How to colour samples on a boxplot by their group?
问题描述
数据的前三行是这样的:
Sample1 Sample2 Sample3 Sample4 Sample5基因1 6.53845 6.38723 6.41613 6.07901 6.45148基因2 6.34303 6.52751 6.48025 6.79185 6.94955基因3 6.17286 6.31772 6.44266 6.61777 7.05509……
依此类推,最多可显示30,000行和300个样本.
我已经能够使用R绘制数据的箱线图,但是现在我希望根据样品的批次/组为箱线图着色.
我有一张这样的批处理信息表.
样品批次样品1样品2样品3 B样品4样品5 C……
以此类推,共8个批次.使用R,我应该如何根据样品所属的批次为箱线图着色?谢谢!
其中一种方法可能是
library(dplyr)图书馆(tidyr)图书馆(小标题)库(ggplot2)df%>%rownames_to_column("Genes")%>%#将行名添加为列collect(Sample,Sample_value,-Genes)%>%#将数据从宽格式转换为长格式以进行绘图left_join(batch_lookup,by ="Sample")%>%#将其与查找表一起添加以添加批处理"列ggplot(aes(x = Sample,y = Sample_value,color = Batch))+ #plot数据geom_boxplot()
哪个情节
样本数据:
df<-结构(list(Sample1 = c(6.53845,6.34303,6.17286),Sample2 = c(6.38723,6.52751,6.31772),Sample3 = c(6.41613,6.48025,6.44266),Sample4 = c(6.07901,6.79185,6.61777),Sample5 = c(6.45148,6.94955,7.05509)),.Names = c("Sample1","Sample2","Sample3","Sample4","Sample5"),类="data.frame",row.names = c("Gene1","Gene2","Gene3"))batch_lookup<-structure(list(Sample = c("Sample1","Sample2","Sample3","Sample4","Sample5"),Batch = c("A","A","B","A","C"))).Names = c("Sample",批"),类="data.frame",row.names = c(NA,-5L))
I currently have gene expression data in a matrix, arranged by samples in columns, and genes in rows. I have about 300 samples against 30,000 genes.
There first three lines of the data is as such:
Sample1 Sample2 Sample3 Sample4 Sample5
Gene1 6.53845 6.38723 6.41613 6.07901 6.45148
Gene2 6.34303 6.52751 6.48025 6.79185 6.94955
Gene3 6.17286 6.31772 6.44266 6.61777 7.05509
... ...
And so on for up to 30,000 rows, and 300 samples.
I have been able to plot a boxplot of the data using R, but I am now looking to colour the boxplot based on the batches/groups of the sample.
I have a table of the batch information as such.
Sample Batch
Sample1 A
Sample2 A
Sample3 B
Sample4 A
Sample5 C
... ...
And so on for 8 batches. Using R, how should I go about colouring the boxplot based on which batch the sample belongs to? Thanks!
One of the approach could be
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
df %>%
rownames_to_column("Genes") %>% #add rownames as column
gather(Sample, Sample_value, -Genes) %>% #convert data to long format from wide format for plotting
left_join(batch_lookup, by = "Sample") %>% #join it with lookup table to add 'Batch' column
ggplot(aes(x=Sample, y=Sample_value, color=Batch)) + #plot data
geom_boxplot()
which plots
Sample data:
df <- structure(list(Sample1 = c(6.53845, 6.34303, 6.17286), Sample2 = c(6.38723,
6.52751, 6.31772), Sample3 = c(6.41613, 6.48025, 6.44266), Sample4 = c(6.07901,
6.79185, 6.61777), Sample5 = c(6.45148, 6.94955, 7.05509)), .Names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame", row.names = c("Gene1",
"Gene2", "Gene3"))
batch_lookup <- structure(list(Sample = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5"), Batch = c("A", "A", "B", "A", "C")), .Names = c("Sample",
"Batch"), class = "data.frame", row.names = c(NA, -5L))
这篇关于R:如何根据他们的小组为箱形图上的样本着色?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!