函数内部的子集,由ddply中指定的变量 [英] subset inside a function by the variables specified in ddply
问题描述
通常,我需要通过对要应用ddply的另一个data.frame进行子集化的变量来对函数中的data.frame进行子集化.为此,我再次明确地在函数内部写入变量,并且我想知道是否有更优雅的方法来执行此操作.下面,我提供了一个简单的示例,只是为了说明这是我目前的方法.
Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this.
d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3)
d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6)
results<-ddply(d1,.(x,y),function(d) {
d2Sub<-subset(d2,x==unique(d$x) & y==unique(d$y))
out<-d$z+d2Sub$z
data.frame(out)
})
推荐答案
plyr
软件包提供了使整个split/apply/combine构造易于实现的功能.据我所知,您只能拆分一个东西:一个列表,一个data.frame,一个数组.
The plyr
package offers functions to make the whole split/apply/combine construct easy. To my knowledge, however, you can only split one thing: a list, a data.frame, an array.
在您的情况下,您要执行的操作是先拆分两个对象,然后拆分mapply
(或Map
),然后重新组合.由于plyr
对于这种更复杂的结构还没有现成的解决方案,因此您可以在R底下进行.这就是我假设人们在plyr
出现之前就在做的事情:
In your case, what you are trying to do is split two objects, then mapply
(or Map
), then recombine. Since plyr
does not have a ready solution for this more complicated construct, you could do it in base R. That's how I assume people were doing things before plyr
came out:
# split
d1.split <- split(d1, list(d1$x, d1$y))
d2.split <- split(d2, list(d2$x, d2$y))
# apply
res.split <- Map(function(df1, df2) data.frame(x = df1$x, y = df1$y,
out = df1$z + df2$z),
d1.split, d2.split, USE.NAMES = FALSE)
# combine
res <- do.call(rbind, res.split)
由您决定是否比您当前的方法更优雅.我所做的作业是为了帮助理解,但是如果愿意,您可以将整个内容编写为单个res <- do.call(rbind, Map(FUN, split(d1, ...), split(d2, ...), ...))
语句.
Up to you to decide if it is more elegant or not than you current approach. The assignments I made were to help comprehension, but you can write the whole thing as a single res <- do.call(rbind, Map(FUN, split(d1, ...), split(d2, ...), ...))
statement if you prefer.
这篇关于函数内部的子集,由ddply中指定的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!