dplyr:将计数出现放入新变量中 [英] dplyr: put count occurrences into new variable
问题描述
给定一个数据框,我如何计数变量的频率,并将其放在新的变量中。
set.seed(9)
df < data.frame(
group = c(rep(1,5),rep(2,5)),
var1 = round(runif(10,1,3),0))
然后我们有:
> df
group var1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 2
6 2 1
7 2 2
8 2 2
9 2 2
10 2 3
想要第三列指示每组( group
)多少次 var1
发生,在这个例子中,这将是:count =(4,4,4,4,1, 1,3,3,3,1)。
我试过 - 没有成功 - 像:
df%>%group_by(group)%>% rowwise()%>%do(count = nrow(。$ var1))
赞赏!
所有您需要做的是通过两列组和var1分组您的数据: p>
df%>%group_by(group,var1)%>%mutate(count = n())
#来源:本地数据框[10 x 3]
#Groups:group,var1
#
#group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1
< h3>评论后编辑
以下是您不应该如此做的一个例子:
code> df%>%group_by(group,var1)%>%do(data.frame(。,coun t = length(。$ group)))
具有 n的dplyr实现()
肯定会更快,更干净,更短,应该始终比上述这样的实现更为优先。
Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.
set.seed(9)
df <- data.frame(
group=c(rep(1,5), rep(2,5)),
var1=round(runif(10,1,3),0))
Then we have:
>df
group var1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 2
6 2 1
7 2 2
8 2 2
9 2 2
10 2 3
Would like a third column indicating per-group (group
) how many times var1
occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1).
I tried - without success - things like:
df %>% group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))
Explanations are very appreciated!
All you need to do is group your data by both columns, "group" and "var1":
df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
# group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1
Edit after comment
Here's an example of how you SHOULD NOT DO IT:
df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))
The dplyr implementation with n()
is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.
这篇关于dplyr:将计数出现放入新变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!