dplyr:将出现次数放入新变量中 [英] dplyr: put count occurrences into new variable
问题描述
想了解 dplyr 代码,但无法弄清楚.已经看到这里描述的许多变量的类似问题(summarizing counts of a factor with dplyr 和 将值出现的行计数放入新变量中,如何使用 dplyr 在 R 中做到这一点?),但是我的任务要小一些.
给定一个数据框,我如何计算变量的频率并将其放入新变量中.
Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.
set.seed(9)
df <- data.frame(
group=c(rep(1,5), rep(2,5)),
var1=round(runif(10,1,3),0))
然后我们有:
>df
group var1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 2
6 2 1
7 2 2
8 2 2
9 2 2
10 2 3
想要第三列指示每个组 (group
) var1
出现的次数,在本例中为:count=(4,4,4,4,1,1,3,3,3,1).我尝试过 - 没有成功 - 诸如:
Would like a third column indicating per-group (group
) how many times var1
occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1).
I tried - without success - things like:
df %>% group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))
非常感谢解释!
推荐答案
您需要做的就是按group"和var1"两列对数据进行分组:
All you need to do is group your data by both columns, "group" and "var1":
df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
# group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1
评论后编辑
以下是您不应该这样做的示例:
Edit after comment
Here's an example of how you SHOULD NOT DO IT:
df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))
带有 n()
的 dplyr 实现肯定更快、更干净、更短,并且应该始终优于上述此类实现.
The dplyr implementation with n()
is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.
这篇关于dplyr:将出现次数放入新变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!