循环在ddply中创建新的变量 [英] Loops to create new variables in ddply
问题描述
我使用ddply来汇总和汇总数据框架变量,我有兴趣循环访问数据框的列表以创建新的变量。 new.data < - ddply(old.data,
c(factor,factor2),
function(df)
c(a11_a10 = CustomFunction(df $ a11_a10),
a12_a11 = CustomFunction(df $ a12_a11),
a13_a12 = CustomFunction(df $ a13_a12),
...
...
.. 。))
有没有办法让我在ddply中插入一个循环,这样我就可以避免写每一个新的总结变量,例如:
pre $ for(i in 11:n){
paste(a i,_a,i - 1)= CustomFunction(.....)
}
我知道这不是实际的做法,但我只是想展示一下如何将其概念化。有没有办法做到这一点,在我调用ddply函数,或通过列表?
更新:因为我是一个新的用户,我不能发表一个答案我自己的问题:
我的答案涉及尼克的答案和Ista的评论的想法:
func < - function(old.data,min,max,gap){
varrange< - min:max
usenames<粘贴(a,varrange,_a,varrange-gap,sep =)
new.data< - ddply(old.data,
。(factor,factor2),
colwise(CustomFunction,c(usenames)))
}
建立在@Nick的优秀答案,这里是一个解决问题的方法
$ b $ pre $ foo <函数(df){
名称= paste(a,11:n,_a,10:(n-1),sep =)
结果= sapply(df [ ],CustomFunction)
}
new.data = ldply(dlply(old.data,c(factor,factor2)),foo)
以下示例应用程序使用 tips
数据集C $ C> GGPLOT2 。假设我们要通过性别$的组合来计算
小费
和 total_bill
的平均值c $ c>和 smoker
,代码如何工作
new = ldply(dlply(tips,c(sex,smoker)),foo)
pre>
它产生如下所示的输出:
.id提示total_bill
1女性2.773519 18.10519
2女性是2.931515 17.97788
3男性3.113402 19.79124
4男性是3.051167 22.28450
$ / code >
这是您要找的吗?
I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables.
new.data <- ddply(old.data,
c("factor", "factor2"),
function(df)
c(a11_a10 = CustomFunction(df$a11_a10),
a12_a11 = CustomFunction(df$a12_a11),
a13_a12 = CustomFunction(df$a13_a12),
...
...
...))
Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g.
for (i in 11:n) {
paste("a", i, "_a", i - 1) = CustomFunction(..... )
}
I know that this is not how it would actually be done, but I just wanted to show how I'd conceptualize it. Is there a way to do this in the function I call in ddply, or via a list?
UPDATE: Because I'm a new user, I can't post an answer to my own question:
My answer involves ideas from Nick's answer and Ista's comment:
func <- function(old.data, min, max, gap) {
varrange <- min:max
usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
new.data <- ddply(old.data,
.(factor, factor2),
colwise(CustomFunction, c(usenames)))
}
解决方案 Building on the excellent answer by @Nick, here is one approach to the problem
foo <- function(df){
names = paste("a", 11:n, "_a", 10:(n-1), sep = "")
results = sapply(df[,names], CustomFunction)
}
new.data = ldply(dlply(old.data, c("factor", "factor2")), foo)
Here is an example application using the tips
dataset in ggplot2
. Suppose we want to calculate the average of tip
and total_bill
by combination of sex
and smoker
, here is how the code would work
foo = function(df){names = c("tip", "total_bill"); sapply(df[,names], mean)}
new = ldply(dlply(tips, c("sex", "smoker")), foo)
It produces the output shown below
.id tip total_bill
1 Female.No 2.773519 18.10519
2 Female.Yes 2.931515 17.97788
3 Male.No 3.113402 19.79124
4 Male.Yes 3.051167 22.28450
Is this what you were looking for?
这篇关于循环在ddply中创建新的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!