如何使用R在数据帧的多列上运行tapply()? [英] How to run tapply() on multiple columns of data frame using R?
问题描述
我有一个如下数据框:
a b1 b2 b3 b4 b5 b6 b7 b8 b9
D 4 6 9 5 3 9 7 9 8
F 7 3 8 1 3 1 4 4 3
R 2 5 5 1 4 2 3 1 6
D 9 2 1 4 3 3 8 2 5
D 5 4 3 1 6 4 1 8 3
R 3 7 9 1 8 5 3 4 2
D 4 1 8 2 6 3 2 7 5
F 7 1 7 2 7 1 6 2 4
D 6 3 9 3 9 9 7 1 2
函数 tapply(df [,2],INDEX = df $ a,sum)
可以很好地生成一个将df [,2]中的所有内容与df $相加的表a,但是当我尝试 tapply(df [,2:10],INDEX = df $ a,sum)
时,得到一个相似的表,除了每列有一个总和(2、3、4,...,10),我收到一条错误消息:
The function tapply(df[,2], INDEX = df$a, sum)
works fine to produce a table that sums everything in df[,2] by df$a, but when I try tapply(df[,2:10], INDEX = df$a, sum)
to get a similar table, except with a sum for each column (2, 3, 4,..., 10), I get an error message reading:
tapply(df [, 2:10],INDEX = df $ a,总和):参数必须具有相同的长度
Error in tapply(df[, 2:10], INDEX = df$a, sum) : arguments must have same length
其他ly,我希望表的行名是 df [,2:10]
的列名,这样第1行是b1,第2行是b2,而第9行是b9。
Additionally, I would like the row names of the table to be the column names of df[,2:10]
, such that row 1 is b1, row 2 is b2, and row 9 is b9.
推荐答案
这是因为tapply可在向量上工作,并将df [,2:10]转换为向量。紧接着,sum将提供总计,而不是每列的总计。使用 aggregate()
,例如:
That's because tapply works on vectors, and transforms df[,2:10] to a vector. Next to that, sum will give you the total sum, not the sum per column. Use aggregate()
, eg :
aggregate(df[,2:10],by=list(df$a), sum)
如果要返回列表,您可以为此使用by()。确保指定colSums而不是sum,例如通过在拆分数据帧上进行操作:
If you want a list returned, you could use by() for that. Make sure to specify colSums instead of sum, as by works on a splitted dataframe :
by(df[,2:10],df$a,FUN=colSums)
这篇关于如何使用R在数据帧的多列上运行tapply()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!