在ggplot中使用Apply函数来绘制数据框列的子集 [英] Using apply functions with ggplot to plot a subset of dataframe columns
问题描述
我有一个包含许多列的数据框 df
...我想绘制列子集的图,其中 c
是我要绘制的列的列表.
I have a dataframe df
with many columns ...
I'd like plot of subset of columns where c
is a list of the columns I'd like to plot.
我当前正在执行以下操作
I'm currently doing the following
df <-structure(list(Image.Name = structure(1:5, .Label = c("D1C1", "D2C2", "D4C1", "D5C3", "D6C2"), class = "factor"), Experiment = structure(1:5, .Label = c("020718 perfusion EPC_BC_HCT115_Day 5", "020718 perfusion EPC_BC_HCT115_Day 6", "020718 perfusion EPC_BC_HCT115_Day 7", "020718 perfusion EPC_BC_HCT115_Day 8", "020718 perfusion EPC_BC_HCT115_Day 9"), class = "factor"), Type = structure(c(2L, 1L, 1L, 2L, 1L), .Label = c("VMO", "VMT"), class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "18-Apr-18", class = "factor"), Time = structure(1:5, .Label = c("12:42:02 PM", "12:42:29 PM", "12:42:53 PM", "12:43:44 PM", "12:44:23 PM"), class = "factor"), Low.Threshold = c(10L, 10L, 10L, 10L, 10L), High.Threshold = c(255L, 255L, 255L, 255L, 255L), Vessel.Thickness = c(7L, 7L, 7L, 7L, 7L), Small.Particles = c(0L, 0L, 0L, 0L, 0L), Fill.Holes = c(0L, 0L, 0L, 0L, 0L), Scaling.factor = c(0.001333333, 0.001333333, 0.001333333, 0.001333333, 0.001333333), X = c(NA, NA, NA, NA, NA), Explant.area = c(1.465629333, 1.093447111, 1.014612444, 1.166950222, 1.262710222), Vessels.area = c(0.255562667, 0.185208889, 0.195792, 0.153907556, 0.227996444), Vessels.percentage.area = c(17.43706003, 16.93807474, 19.29722044, 13.18887067, 18.05611774), Total.Number.of.Junctions = c(56L, 32L, 39L, 18L, 46L), Junctions.density = c(38.20884225, 29.26524719, 38.43832215, 15.42482246, 36.42957758), Total.Vessels.Length = c(12.19494843, 9.545333135, 10.2007416, 7.686755647, 11.94211976), Average.Vessels.Length = c(0.182014156, 0.153956986, 0.188902622, 0.08938088, 0.183724919), Total.Number.of.End.Points = c(187L, 153L, 145L, 188L, 167L), Average.Lacunarity = c(0.722820111, 0.919723402, 0.86403871, 1.115896082, 0.821753818)), .Names = c("Image.Name", "Experiment", "Type", "Date", "Time", "Low.Threshold", "High.Threshold", "Vessel.Thickness", "Small.Particles", "Fill.Holes", "Scaling.factor", "X", "Explant.area", "Vessels.area", "Vessels.percentage.area", "Total.Number.of.Junctions", "Junctions.density", "Total.Vessels.Length", "Average.Vessels.Length", "Total.Number.of.End.Points", "Average.Lacunarity"), row.names = c(NA, -5L), class = "data.frame")
doBarPlot <- function(x) {
p <- ggplot(x, aes_string(x="Type", y=colnames(x), fill="Type") ) +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width=0.5, na.rm = TRUE) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5) )
print(p)
ggsave(sprintf("plots/%s_bars.pdf", colnames(x) ) )
return(p)
}
c = c('Total.Vessels.Length', 'Total.Number.of.Junctions', 'Total.Number.of.End.Points', 'Average.Lacunarity')
p[c] <- lapply(df[c], doBarPlot)
但是这会产生以下错误:
However this yields the following error :
Error: ggplot2 doesn't know how to deal with data of class numeric
调试表明,doBarPlot内的 x
类型为数字
,而不是 data.frame
,因此 ggplot
错误.但是, test<-df2 [c]
产生类型为 data.frame
的变量.
Debugging shows that x
inside of doBarPlot is of the type numeric
rather than data.frame
, so ggplot
errors. However, test <- df2[c]
yields a variable of the type data.frame
.
为什么 x
是数字
?在不求助于循环的情况下应用 doBarPlot
的最佳方法是什么?
Why is x
a numeric
?
What's the best way to apply doBarPlot
without resorting to a loop?
推荐答案
正如其他人所指出的那样,您最初的方法存在的问题是,当您在数据帧上使用 lapply
时,您所需要的元素迭代的将是列向量,而不是1列数据帧.但是,即使您对1列数据帧进行了迭代,您的函数也会失败:提供给 ggplot
调用的数据帧将不包含您所需要的 Type
列在情节中使用.
As others have noted, the problem with your initial approach is that when you use lapply
on a data frame, the elements that you are iterating over will be the column vectors, rather than 1-column data frames. However, even if you did iterate over 1-column data frames, your function would fail: the data frame supplied to the ggplot
call wouldn't contain the Type
column that you use in the plot.
相反,您可以将函数修改为带有两个参数:完整的数据框和要在y轴上使用的列的名称.
Instead, you could modify the function to take two arguments: the full data frame, and the name of the column that you want to use on the y-axis.
doBarPlot <- function(data, y) {
p <- ggplot(data, aes_string(x = "Type", y = y, fill = "Type")) +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(
fun.data = "mean_cl_normal",
geom = "errorbar",
width = 0.5,
na.rm = TRUE
) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5))
print(p)
ggsave(sprintf("plots/%s_bars.pdf", y))
return(p)
}
然后,您可以使用 lapply
遍历要绘制的列的字符向量,同时通过 ...
作为固定参数来提供数据框到您的绘图功能:
Then, you can use lapply
to iterate over the character vector of columns you want to plot, while supplyig the data frame via the ...
as a fixed argument to your plotting function:
library(ggplot2)
cols <- c('Total.Vessels.Length', 'Total.Number.of.Junctions',
'Total.Number.of.End.Points', 'Average.Lacunarity')
p <- lapply(cols, doBarPlot, data = df)
此外,如果您不介意将所有绘图放在一个文件中,则还可以使用 tidyr :: gather
将数据重整为长格式,然后使用 facet_wrap
(如@RichardTelford在他的评论中所建议),避免了迭代并完全不需要函数:
Further, if you don't mind having all of the plots in one file, you could also use tidyr::gather
to reshape your data into long form, and use facet_wrap
in your plot (as suggested by @RichardTelford in his comment), avoiding the iteration and the need for a function altogether:
library(tidyverse)
df %>%
gather(variable, value, cols) %>%
ggplot(aes(x = Type, y = value, fill = Type)) +
facet_wrap(~ variable, scales = "free_y") +
stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
stat_summary(
fun.data = "mean_cl_normal",
geom = "errorbar",
width = 0.5,
na.rm = TRUE
) +
ggtitle("VMO vs. VMT") +
theme(plot.title = element_text(hjust = 0.5))
这篇关于在ggplot中使用Apply函数来绘制数据框列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!