索引变量时如何在ggplot2中使用lapply [英] How to use lapply with ggplot2 while indexing variables

查看:49
本文介绍了索引变量时如何在ggplot2中使用lapply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从大型数据框中生成数百个连续数据的箱线图,并按年"因子进行分层.我首先从原始数据框架创建一个列表,该列表包含每个因变量和年份.

I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". I started by creating a list from the original data frame that contains each dependent variable and the year.

这是一个类似于我的示例数据集:

Here is an example data set that looks like mine:

l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),     
var1=sample(1:100,30,replace=T)), 
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var2=sample(100:200,30,replace=T)),
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var3=sample(25:50,30, replace=T)))

下一步是在列表上应用ggplot2函数.这两个函数均未生成图:

The next step was to apply a ggplot2 function over the list. Neither of these functions produce plots:

lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +    
 geom_boxplot() + ylab(names(j[2])) )

lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +  
 geom_boxplot() + ylab(names(j[2])) )

从这些脚本生成相同的错误消息:

The same error message is generated from those scripts:

错误:情节中没有图层"

Error: No layers in plot"

实际上,我的数据框要大得多-2800个观测值和250多个具有唯一描述性名称的变量(例如"M2_loss","SSC").每个变量的比例不同,因此使用构面不是一个好的解决方案.使这个问题与其他关于stackoverflow的示例不同的原因是,我试图对数据进行索引而不是对其进行显式命名.重要的是,我必须捕获每个变量的唯一名称并使用它来标记y轴.

In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (e.g. "M2_loss", "SSC"). Each variable is on a different scale, so using facets is not a good solution. What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. It is important that I capture the unique name of each variable and use it to label the y-axis.

关于如何进行的任何想法?

Any ideas on how to proceed?

推荐答案

如果我了解您想要的内容,我认为您可以使用 aes_string 而不是 aes 来简化事情代码>.这使您可以将感兴趣的变量指定为字符串而不是名称.这是一个使用破旧的 iris 数据集的简单示例:

If I understand what you want, I think you can make things much simpler by using aes_string instead of aes. This allows you to specify the variables of interest as strings rather than as names. Here is a simple example using the well worn iris data set:

<代码>拉普利(名称(虹膜)[1:4],功能(n)ggplot(data = iris,aes_string(y = n,x ="Species"))+geom_boxplot())这将为 iris 数据集中的四个定量变量的每一个生成并排箱形图(按物种),并且应该易于调整以适应您的数据框架.

lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() ) This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris data set and should be easy to adjust for your data frame.

这篇关于索引变量时如何在ggplot2中使用lapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆