按两个或多个因子变量进行汇总统计? [英] Summary statistics by two or more factor variables?

查看:27
本文介绍了按两个或多个因子变量进行汇总统计?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最好用一个例子来说明

str(mtcars)
mtcars$gear <- factor(mtcars$gear, labels=c("three","four","five"))
mtcars$cyl <- factor(mtcars$cyl, labels=c("four","six","eight"))
mtcars$am <- factor(mtcars$am, labels=c("manual","auto")
str(mtcars)
tapply(mtcars$mpg, mtcars$gear, sum)

这给了我每个齿轮的总英里数.但是如果我想要一个 3x3 的桌子,上面有齿轮,侧面有圆柱体,还有 9 个带有二元总和的单元格,我如何才能聪明地"得到它.

That gives me the summed mpg per gear. But say I wanted a 3x3 table with gear across the top and cyl down the side, and 9 cells with the bivariate sums in, how would I get that 'smartly'.

我可以去.

tapply(mtcars$mpg[mtcars$cyl=="four"], mtcars$gear[mtcars$cyl=="four"], sum)
tapply(mtcars$mpg[mtcars$cyl=="six"], mtcars$gear[mtcars$cyl=="six"], sum)
tapply(mtcars$mpg[mtcars$cyl=="eight"], mtcars$gear[mtcars$cyl=="eight"], sum)

这看起来很麻烦.

那我如何在混合中引入第三个变量?

Then how would I bring a 3rd variable in the mix?

这有点在我正在考虑的空间中.使用 ddply 的汇总统计

This is somewhat in the space I'm thinking about. Summary statistics using ddply

更新这让我明白了,但它并不漂亮.

update This gets me there, but it's not pretty.

aggregate(mpg ~ am+cyl+gear, mtcars,sum)

干杯

推荐答案

这个怎么样,还在用tapply()?它的用途比您所知道的还要多!

How about this, still using tapply()? It's more versatile than you knew!

with(mtcars, tapply(mpg, list(cyl, gear), sum))
#       three  four five
# four   21.5 215.4 56.4
# six    39.5  79.0 19.7
# eight 180.6    NA 30.8

或者,如果您希望打印输出更易于解释:

Or, if you'd like the printed output to be a bit more interpretable:

with(mtcars, tapply(mpg, list("Cylinder#"=cyl, "Gear#"=gear), sum))

<小时>

如果你想使用两个以上的交叉分类变量,这个想法是完全一样的.结果将在一个 3 维或更多维数组中返回:


If you want to use more than two cross-classifying variables, the idea's exactly the same. The results will then be returned in a 3-or-more-dimensional array:

A <- with(mtcars, tapply(mpg, list(cyl, gear, carb), sum))

dim(A)
# [1] 3 3 6
lapply(1:6, function(i) A[,,i]) # To convert results to a list of matrices

# But eventually, the curse of dimensionality will begin to kick in...
table(is.na(A))
# FALSE  TRUE 
#    12    42 

这篇关于按两个或多个因子变量进行汇总统计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆