索引变量时如何在ggplot2中使用lapply [英] How to use lapply with ggplot2 while indexing variables
问题描述
我想从大型数据框中生成数百个连续数据的箱线图,并按年"因子进行分层.我首先从原始数据框架创建一个列表,该列表包含每个因变量和年份.
I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". I started by creating a list from the original data frame that contains each dependent variable and the year.
这是一个类似于我的示例数据集:
Here is an example data set that looks like mine:
l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var1=sample(1:100,30,replace=T)),
data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var2=sample(100:200,30,replace=T)),
data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var3=sample(25:50,30, replace=T)))
下一步是在列表上应用ggplot2函数.这两个函数均未生成图:
The next step was to apply a ggplot2 function over the list. Neither of these functions produce plots:
lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +
geom_boxplot() + ylab(names(j[2])) )
lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +
geom_boxplot() + ylab(names(j[2])) )
从这些脚本生成相同的错误消息:
The same error message is generated from those scripts:
错误:情节中没有图层"
Error: No layers in plot"
实际上,我的数据框要大得多-2800个观测值和250多个具有唯一描述性名称的变量(例如"M2_loss","SSC").每个变量的比例不同,因此使用构面不是一个好的解决方案.使这个问题与其他关于stackoverflow的示例不同的原因是,我试图对数据进行索引而不是对其进行显式命名.重要的是,我必须捕获每个变量的唯一名称并使用它来标记y轴.
In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (e.g. "M2_loss", "SSC"). Each variable is on a different scale, so using facets is not a good solution. What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. It is important that I capture the unique name of each variable and use it to label the y-axis.
关于如何进行的任何想法?
Any ideas on how to proceed?
推荐答案
如果我了解您想要的内容,我认为您可以使用 aes_string
而不是 aes 来简化事情代码>.这使您可以将感兴趣的变量指定为字符串而不是名称.这是一个使用破旧的
iris
数据集的简单示例:
If I understand what you want, I think you can make things much simpler by using aes_string
instead of aes
. This allows you to specify the variables of interest as strings rather than as names. Here is a simple example using the well worn iris
data set:
<代码>拉普利(名称(虹膜)[1:4],功能(n)ggplot(data = iris,aes_string(y = n,x ="Species"))+geom_boxplot())这将为 iris
数据集中的四个定量变量的每一个生成并排箱形图(按物种),并且应该易于调整以适应您的数据框架.
lapply(
names(iris)[1:4],
function(n)
ggplot(data = iris, aes_string(y = n, x = "Species")) +
geom_boxplot()
)
This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris
data set and should be easy to adjust for your data frame.
这篇关于索引变量时如何在ggplot2中使用lapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!