dplyr:将计数出现放入新变量中 [英] dplyr: put count occurrences into new variable

查看:78
本文介绍了dplyr:将计数出现放入新变量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望得到一个dplyr代码的手,但不能弄清楚。已经看到许多变量在这里描述了类似的问题(总结一个因素的计数将值发生排列为新变量,如何使用dplyr?R)?但是我的任务稍微小些。

给定一个数据框,我如何计数变量的频率,并将其放在新的变量中。

  set.seed(9)
df < data.frame(
group = c(rep(1,5),rep(2,5)),
var1 = round(runif(10,1,3),0))

然后我们有:

 > df 
group var1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 2
6 2 1
7 2 2
8 2 2
9 2 2
10 2 3

想要第三列指示每组( group )多少次 var1 发生,在这个例子中,这将是:count =(4,4,4,4,1, 1,3,3,3,1)。
我试过 - 没有成功 - 像:

  df%>%group_by(group)%>% rowwise()%>%do(count = nrow(。$ var1))

赞赏!

解决方案

所有您需要做的是通过两列组和var1分组您的数据: p>

  df%>%group_by(group,var1)%>%mutate(count = n())
#来源:本地数据框[10 x 3]
#Groups:group,var1

#group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1



< h3>评论后编辑

以下是您不应该如此做的一个例子:

 code> df%>%group_by(group,var1)%>%do(data.frame(。,coun t = length(。$ group)))

具有 n的dplyr实现()肯定会更快,更干净,更短,应该始终比上述这样的实现更为优先。


Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.

set.seed(9)
df <- data.frame(
    group=c(rep(1,5), rep(2,5)),
    var1=round(runif(10,1,3),0))

Then we have:

>df
   group var1
1      1    1
2      1    1
3      1    1
4      1    1
5      1    2
6      2    1
7      2    2
8      2    2
9      2    2
10     2    3

Would like a third column indicating per-group (group) how many times var1 occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1). I tried - without success - things like:

df %>%  group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))

Explanations are very appreciated!

解决方案

All you need to do is group your data by both columns, "group" and "var1":

df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
#   group var1 count
#1      1    1     4
#2      1    1     4
#3      1    1     4
#4      1    1     4
#5      1    2     1
#6      2    1     1
#7      2    2     3
#8      2    2     3
#9      2    2     3
#10     2    3     1

Edit after comment

Here's an example of how you SHOULD NOT DO IT:

df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))

The dplyr implementation with n() is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.

这篇关于dplyr:将计数出现放入新变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆