快速/优雅的方式来构建均值/方差汇总表 [英] quick/elegant way to construct mean/variance summary table

查看:110
本文介绍了快速/优雅的方式来构建均值/方差汇总表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以完成这项任务,但是我觉得必须有一种最好的"(最简洁,最紧凑,最清晰的代码,最快的方式)来完成它,而且到目前为止还没有弄清楚……

I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...

对于一组指定的分类因子,我想按组构建均值和方差表.

For a specified set of categorical factors I want to construct a table of means and variances by group.

生成数据:

set.seed(1001)
d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],
                 f3=factor(as.character(as.roman(1:3))),rep=1:4)
d$y <- runif(nrow(d))
d$z <- rnorm(nrow(d))

所需的输出:

  f1 f2  f3    y.mean      y.var
1  A  a   I 0.6502307 0.09537958
2  A  a  II 0.4876630 0.11079670
3  A  a III 0.3102926 0.20280568
4  A  b   I 0.3914084 0.05869310
5  A  b  II 0.5257355 0.21863126
6  A  b III 0.3356860 0.07943314
... etc. ...

使用aggregate/merge:

using aggregate/merge:

library(reshape)
m1 <- aggregate(y~f1*f2*f3,data=d,FUN=mean)
m2 <- aggregate(y~f1*f2*f3,data=d,FUN=var)
mvtab <- merge(rename(m1,c(y="y.mean")),
      rename(m2,c(y="y.var")))

使用ddply/summarise (可能是最好的方法,但是无法使其正常工作):

using ddply/summarise (possibly best but haven't been able to make it work):

mvtab2 <- ddply(subset(d,select=-c(z,rep)),
                .(f1,f2,f3),
                summarise,numcolwise(mean),numcolwise(var))

产生

Error in output[[var]][rng] <- df[[var]] : 
  incompatible types (from closure to logical) in subassignment type fix

使用melt/cast (也许是最好的?)

using melt/cast (maybe best?)

mvtab3 <- cast(melt(subset(d,select=-c(z,rep)),
          id.vars=1:3),
     ...~.,fun.aggregate=c(mean,var))
## now have to drop "variable"
mvtab3 <- subset(mvtab3,select=-variable)
## also should rename response variables

reshape2中不会(?)工作.向某人解释...~.可能很棘手!

Won't (?) work in reshape2. Explaining ...~. to someone could be tricky!

推荐答案

我有点困惑.这行不通吗?

I'm a bit puzzled. Does this not work:

mvtab2 <- ddply(d,.(f1,f2,f3),
            summarise,y.mean = mean(y),y.var = var(y))

这给了我类似的东西:

   f1 f2  f3    y.mean       y.var
1   A  a   I 0.6502307 0.095379578
2   A  a  II 0.4876630 0.110796695
3   A  a III 0.3102926 0.202805677
4   A  b   I 0.3914084 0.058693103
5   A  b  II 0.5257355 0.218631264

格式正确,但看起来值与您指定的值不同.

Which is in the right form, but it looks like the values are different that what you specified.

编辑

以下是使用numcolwise制作版本的方法:

Here's how to make your version with numcolwise work:

mvtab2 <- ddply(subset(d,select=-c(z,rep)),.(f1,f2,f3),summarise,
                y.mean = numcolwise(mean)(piece),
                y.var = numcolwise(var)(piece)) 

您忘记了将实际数据传递给numcolwise.然后有一个ddply小技巧,每个片段在内部都称为piece. (其中Hadley指出,不应依赖此注释,因为它可能在plyr的未来版本中有所更改.)

You forgot to pass the actual data to numcolwise. And then there's the little ddply trick that each piece is called piece internally. (Which Hadley points out in the comments shouldn't be relied upon as it may change in future versions of plyr.)

这篇关于快速/优雅的方式来构建均值/方差汇总表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆