dplyr:将出现次数放入新变量中 [英] dplyr: put count occurrences into new variable

查看:15
本文介绍了dplyr:将出现次数放入新变量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想了解 dplyr 代码,但无法弄清楚.已经看到这里描述的许多变量的类似问题(summarizing counts of a factor with dplyr将值出现的行计数放入新变量中,如何使用 dplyr 在 R 中做到这一点?),但是我的任务要小一些.
给定一个数据框,我如何计算变量的频率并将其放入新变量中.

Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.

set.seed(9)
df <- data.frame(
    group=c(rep(1,5), rep(2,5)),
    var1=round(runif(10,1,3),0))

然后我们有:

>df
   group var1
1      1    1
2      1    1
3      1    1
4      1    1
5      1    2
6      2    1
7      2    2
8      2    2
9      2    2
10     2    3

想要第三列指示每个组 (group) var1 出现的次数,在本例中为:count=(4,4,4,4,1,1,3,3,3,1).我尝试过 - 没有成功 - 诸如:

Would like a third column indicating per-group (group) how many times var1 occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1). I tried - without success - things like:

df %>%  group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))

非常感谢解释!

推荐答案

您需要做的就是按group"和var1"两列对数据进行分组:

All you need to do is group your data by both columns, "group" and "var1":

df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
#   group var1 count
#1      1    1     4
#2      1    1     4
#3      1    1     4
#4      1    1     4
#5      1    2     1
#6      2    1     1
#7      2    2     3
#8      2    2     3
#9      2    2     3
#10     2    3     1

评论后编辑

以下是您不应该这样做的示例:

Edit after comment

Here's an example of how you SHOULD NOT DO IT:

df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))

带有 n() 的 dplyr 实现肯定更快、更干净、更短,并且应该始终优于上述此类实现.

The dplyr implementation with n() is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.

这篇关于dplyr:将出现次数放入新变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆