更快的计算频率和从长到宽投射的方法 [英] Faster ways to calculate frequencies and cast from long to wide
问题描述
我正在尝试获取两个变量week"和id"的每个级别组合的计数.我希望结果将id"作为行,将week"作为列,并将计数作为值.
I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
到目前为止我尝试过的示例(尝试了很多其他方法,包括添加一个虚拟变量 = 1,然后在上面添加 fun.aggregate = sum
):
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum
over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
但是,我一定是做错了什么,因为这个功能没有完成.有没有更好的方法来做到这一点?
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
输入:
id week
1 1
1 2
1 3
1 1
2 3
输出:
1 2 3
1 2 1 1
2 0 0 1
推荐答案
你不需要 ddply
这个.reshape2
中的 dcast
就足够了:
You don't need ddply
for this. The dcast
from reshape2
is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
<小时>
对于基本 R 解决方案(除了 table
- Joshua Uhlrich 发布的),请尝试 xtabs
:
Edit : For a base R solution (other than table
- as posted by Joshua Uhlrich), try xtabs
:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
这篇关于更快的计算频率和从长到宽投射的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!