如何按因子对数据框进行子集化并为每个子集重复绘图? [英] How subset a data frame by a factor and repeat a plot for each subset?

查看:27
本文介绍了如何按因子对数据框进行子集化并为每个子集重复绘图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手.如果这个问题有明显的答案,但我无法找到解决方案,请原谅我.我有使用 SAS 的经验,可能只是以错误的方式思考这个问题.

I am new to R. Forgive me if this if this question has an obvious answer but I've not been able to find a solution. I have experience with SAS and may just be thinking of this problem in the wrong way.

我有一个数据集,其中包含来自数百名受试者的重复测量值,每个受试者在不同年龄段都有多个测量值.每个主题都由一个 ID 变量标识.我想按年龄为每个个体 (ID) 绘制每个测量值(比如体重).

I have a dataset with repeated measures from hundreds of subjects with each subject having multiple measurements across different ages. Each subject is identified by an ID variable. I'd like to plot each measurement (let's say body WEIGHT) by AGE for each individual subject (ID).

我使用 ggplot2 来做这样的事情:

I've used ggplot2 to do something like this:

ggplot(data = dataset, aes(x = AGE, y = WEIGHT )) + geom_line() + facet_wrap(~ID)

这适用于少数主题,但不适用于整个数据集.

This works well for a small number of subjects but won't work for the entire dataset.

我也试过这样的:

ggplot(data=data, aes(x = AGE,y = BW, group = ID, colour = ID)) + geom_line()

这也适用于少数主题,但无法读取数百个主题.

This also works for a small number of subjects but is unreadable with hundreds of subjects.

我尝试使用这样的代码进行子集化:

I've tried to subset using code like this:

temp <- split(dataset,dataset$ID)

但我不确定如何处理生成的数据集.或者也许有一种方法可以简单地调整 facet_wrap 以便创建单独的图?

but I'm not sure how to work with the resulting dataset. Or perhaps there is a way to simply adjust the facet_wrap so that individual plots are created?

谢谢!

推荐答案

因为您想拆分数据集并为因子的每个级别绘制一个图,所以我会使用 split-apply-return 工具之一来解决这个问题来自 plyr 包.

Because you want to split up the dataset and make a plot for each level of a factor, I would approach this with one of the split-apply-return tools from the plyr package.

这是一个使用 mtcars 数据集的玩具示例.我首先创建图并将其命名为 p,然后使用 dlply 按因子拆分数据集并返回每个级别的图.我正在利用 ggplot2 中的 %+% 来替换图中的 data.frame.

Here is a toy example using the mtcars dataset. I first create the plot and name it p, then use dlply to split the dataset by a factor and return a plot for each level. I'm taking advantage of %+% from ggplot2 to replace the data.frame in a plot.

p = ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
    geom_line()

require(plyr)
dlply(mtcars, .(cyl), function(x) p %+% x)

这会一个接一个地返回所有地块.如果您命名结果列表对象,您也可以一次调用一个图.

This returns all the plots, one after another. If you name the resulting list object you can also call one plot at a time.

plots = dlply(mtcars, .(cyl), function(x) p %+% x)
plots[1]

编辑

我开始考虑根据这个因素在每个情节上放置一个标题,这似乎很有用.

I started thinking about putting a title on each plot based on the factor, which seems like it would be useful.

dlply(mtcars, .(cyl), function(x) p %+% x + facet_wrap(~cyl))

编辑 2

这是将这些保存在单个文档中的一种方法,每页一个图.这适用于名为 plots 的绘图列表.它将它们全部保存到一个文档中,每页一个图.我没有更改 pdf 中的任何默认值,但您当然可以探索可以进行的更改.

Here is one way to save these in a single document, one plot per page. This is working with the list of plots named plots. It saves them all to one document, one plot per page. I didn't change any of the defaults in pdf, but you can certainly explore the changes you can make.

pdf()
plots
dev.off()

更新以使用包 dplyr 而不是 plyr.这是在 do 中完成的,输出将有一个命名列,其中包含作为列表的所有图.

Updated to use package dplyr instead of plyr. This is done in do, and the output will have a named column that contains all the plots as a list.

library(dplyr)
plots = mtcars %>%
    group_by(cyl) %>%
    do(plots = p %+% . + facet_wrap(~cyl))


Source: local data frame [3 x 2]
Groups: <by row>

  cyl           plots
1   4 <S3:gg, ggplot>
2   6 <S3:gg, ggplot>
3   8 <S3:gg, ggplot>

要查看 R 中的绘图,只需询问包含绘图的列.

To see the plots in R, just ask for the column that contains the plots.

plots$plots

并保存为pdf

pdf()
plots$plots
dev.off()

这篇关于如何按因子对数据框进行子集化并为每个子集重复绘图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆