如何获取多个组的多个变量的摘要统计信息? [英] How to get summary statistics for multiple variables by multiple groups?

查看：209 发布时间：2020/5/28 20:25:30 r aggregate plyr

本文介绍了如何获取多个组的多个变量的摘要统计信息?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道在此论坛上有很多答案，涉及如何使用aggregate，ddply或data.table之类的选项获取多个组的摘要统计信息(例如，均值，se，N).但是，我不确定如何一次将这些功能应用于多个列.

I know that there are many answers provided in this forum on how to get summary statistics (e.g. mean, se, N) for multiple groups using options like aggregate , ddply or data.table. I'm not sure, however, how to apply these functions over multiple columns at once.

更具体地说，我想知道如何将以下ddply命令扩展到多列(dv1，dv2，dv3)，而不必每次都用不同的变量名重新键入代码.

More specifically, I would like to know how to extend the following ddply command over multiple columns (dv1, dv2, dv3) without re-typing the code with different variable name each time.

library(reshape2)
library(plyr)

group1 <- c(rep(LETTERS[1:4], c(4,6,6,8)))
group2 <- c(rep(LETTERS[5:8], c(6,4,8,6)))
group3 <- c(rep(LETTERS[9:10], c(12,12)))
my.dat <- data.frame(group1, group2, group3, dv1=rnorm(24),dv2=rnorm(24),dv3=rnorm(24))
my.dat

data1 <- ddply(my.dat, c("group1", "group2","group3"), summarise,
               N    = length(dv1),
               mean = mean(dv1,na.rm=T),
               sd   = sd(dv1,na.rm=T),
               se   = sd / sqrt(N)
)
data1

如何在多个列上应用此ddply函数，以使每个结果变量的结果分别为data1，data2，data3 ...?我认为这可能是解决方案:

How can I apply this ddply function over multiple columns such that the outcome will be data1, data2, data3... for each outcome variable? I thought this could be the solution:

dfm <- melt(my.dat, id.vars = c("group1", "group2","group3"))
lapply(list(.(group1, variable), .(group2, variable),.(group3, variable)), 
   ddply, .data = dfm, .fun = summarize, 
   mean = mean(value), 
   sd = sd(value),
   N=length(value),
   se=sd/sqrt(N))

看起来它朝着正确的方向发展，但并非完全符合我的需求.此解决方案按组分别提供统计信息.我需要数据1中的结果(例如，第一个汇总的组是A，E和I的人；第二个汇总的是B，E和I等的人……)

Looks like it's in the right direction but not exactly what I need. This solution provides the statistics by each group separately. What I need an outcome as in data1 (e.g. first aggregated group is people who are at A, E and I; the second is those who are at group B, E and I etc...)

推荐答案

这里是先重塑数据的示例.我编写了一个自定义函数来提高可读性:

Here's an illustration of reshaping your data first. I've written a custom function to improve readability:

mysummary <- function(x,na.rm=F){
  res <- list(mean=mean(x, na.rm=na.rm),
              sd=sd(x,na.rm=na.rm),
              N=length(x))
  res$se <- res$sd/sqrt(res$N)
  res
}

library(data.table)

res <- melt(setDT(my.dat),id.vars=c("group1","group2","group3"))[,mysummary(value),
    by=.(group1,group2,group3,variable)]

> head(res)
   group1 group2 group3 variable  mean        sd N       se
1:      A      E      I      dv1  9.75  6.994045 4 3.497023
2:      B      E      I      dv1  9.50  7.778175 2 5.500000
3:      B      F      I      dv1 16.00  4.082483 4 2.041241
4:      C      G      I      dv1 14.50 10.606602 2 7.500000
5:      C      G      J      dv1 10.75 10.372239 4 5.186119
6:      D      G      J      dv1 13.00  4.242641 2 3.000000

或者没有自定义功能，这要感谢@Jaap

Or without the custom function, thanks to @Jaap

melt(setDT(my.dat),
     id=c("group1","group2","group3"))[, .(mean = mean(value),
                                           sd = sd(value),
                                           n = .N,
                                           se = sd(value)/sqrt(.N)),
                                       .(group1, group2, group3, variable)]

这篇关于如何获取多个组的多个变量的摘要统计信息?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何获取多个组的多个变量的摘要统计信息? [英] How to get summary statistics for multiple variables by multiple groups?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何获取多个组的多个变量的摘要统计信息? [英] How to get summary statistics for multiple variables by multiple groups?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭