如何使用R在数据帧的多列上运行tapply()? [英] How to run tapply() on multiple columns of data frame using R?

查看:85
本文介绍了如何使用R在数据帧的多列上运行tapply()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框:

a   b1  b2  b3  b4  b5  b6  b7  b8  b9
D   4   6   9   5   3   9   7   9   8
F   7   3   8   1   3   1   4   4   3
R   2   5   5   1   4   2   3   1   6
D   9   2   1   4   3   3   8   2   5
D   5   4   3   1   6   4   1   8   3
R   3   7   9   1   8   5   3   4   2
D   4   1   8   2   6   3   2   7   5
F   7   1   7   2   7   1   6   2   4
D   6   3   9   3   9   9   7   1   2

函数 tapply(df [,2],INDEX = df $ a,sum)可以很好地生成一个将df [,2]中的所有内容与df $相加的表a,但是当我尝试 tapply(df [,2:10],INDEX = df $ a,sum)时,得到一个相似的表,除了每列有一个总和(2、3、4,...,10),我收到一条错误消息:

The function tapply(df[,2], INDEX = df$a, sum) works fine to produce a table that sums everything in df[,2] by df$a, but when I try tapply(df[,2:10], INDEX = df$a, sum) to get a similar table, except with a sum for each column (2, 3, 4,..., 10), I get an error message reading:


tapply(df [, 2:10],INDEX = df $ a,总和):参数必须具有相同的长度

Error in tapply(df[, 2:10], INDEX = df$a, sum) : arguments must have same length

其他ly,我希望表的行名是 df [,2:10] 的列名,这样第1行是b1,第2行是b2,而第9行是b9。

Additionally, I would like the row names of the table to be the column names of df[,2:10], such that row 1 is b1, row 2 is b2, and row 9 is b9.

推荐答案

这是因为tapply可在向量上工作,并将df [,2:10]转换为向量。紧接着,sum将提供总计,而不是每列的总计。使用 aggregate(),例如:

That's because tapply works on vectors, and transforms df[,2:10] to a vector. Next to that, sum will give you the total sum, not the sum per column. Use aggregate(), eg :

aggregate(df[,2:10],by=list(df$a), sum)

如果要返回列表,您可以为此使用by()。确保指定colSums而不是sum,例如通过在拆分数据帧上进行操作:

If you want a list returned, you could use by() for that. Make sure to specify colSums instead of sum, as by works on a splitted dataframe :

by(df[,2:10],df$a,FUN=colSums)

这篇关于如何使用R在数据帧的多列上运行tapply()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆