计算某一组的相对频率 [英] Calculate relative frequency for a certain group

查看:189
本文介绍了计算某一组的相对频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分类变量的数据框,我已经分组,我得到了每个组的计数。

I have a data.frame of categorical variables that I have divided into groups and I got the counts for each group.

My original data nyD looks like:

Source: local data frame [7 x 3]
Groups: v1, v2, v3

  v1    v2   v3
1  a  plus  yes
2  a  plus  yes
3  a minus   no
4  b minus  yes
5  b     x  yes
6  c     x notk
7  c     x notk

I performed the following operations using dplyr:

ny1 <- nyD %>% group_by(v1,v2,v3)%>%
           summarise(count=n()) %>%
           mutate(prop = count/sum(count))


My data "ny1" looks like:

Source: local data frame [5 x 5]
Groups: v1, v2

  v1    v2   v3 count prop
1  a minus   no     1    1
2  a  plus  yes     2    1
3  b minus  yes     1    1
4  b     x  yes     1    1
5  c     x notk     2    1

我想计算相对值频率与prop变量中的V1组有关。 prop变量应该是相应的计数除以V1组的计数总和。 V1组共有3a,2b和1c。也就是说,ny1 $ prop [1]< - 1/3,ny1 $ prop [2]< - 2/3 ....
使用count / sum(count)的mutate操作不正确。我需要指出,这个总和应该只对V1组实现。
有没有办法使用dplyr来实现这一点?

I want to calculate the relative frequency in relation to the V1 Groups in the prop variable. The prop variable should be the corresponding count divided by the "sum of counts for V1 group". V1 group has a total of 3 "a", 2 "b" and 1 "c". That is, ny1$prop[1] <- 1/3, ny1$prop[2] <- 2/3.... The mutate operation where using count/sum(count) is not correct. I need to specify that the sum should be realed only to V1 group. Is there a way to use dplyr to achieve this?

推荐答案

你可以一步一步地完成这件事情从您的原始数据 nyD 而不创建 ny1 )。那是因为在总结之后,您将运行 mutate dplyr 将默认删除一个聚合级别( v2 )(肯定是我最喜欢的功能 dplyr ),并且只会聚合通过 v1

You can do this whole thing in one step (from your original data nyD and without creating ny1). That is because when you'll run mutate after summarise, dplyr will drop one aggregation level (v2) by default (certainly my favorite feature in dplyr) and will aggregate only by v1

nyD %>% 
   group_by(v1, v2) %>%
   summarise(count = n()) %>%
   mutate(prop = count/sum(count))

# Source: local data frame [5 x 4]
# Groups: v1
# 
#   v1    v2 count      prop
# 1  a minus     1 0.3333333
# 2  a  plus     2 0.6666667
# 3  b minus     1 0.5000000
# 4  b     x     1 0.5000000
# 5  c     x     2 1.0000000

或使用计数的更短版本(感谢@beginneR)

Or a shorter version using count (Thanks to @beginneR)

df %>% 
  count(v1, v2) %>% 
  mutate(prop = n/sum(n))

这篇关于计算某一组的相对频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆