与 dplyr 的相对频率/比例 [英] Relative frequencies / proportions with dplyr

查看:17
本文介绍了与 dplyr 的相对频率/比例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想计算每个组内不同值的比例.例如,使用mtcars 数据,我如何通过am 计算齿轮数量相对频率(自动/手动)与 dplyr 合二为一?

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with dplyr?

library(dplyr)
data(mtcars)
mtcars <- tbl_df(mtcars)

# count frequency
mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n())

# am gear  n
#  0    3 15 
#  0    4  4 
#  1    4  8  
#  1    5  5 

我想达到的目标:

am gear  n rel.freq
 0    3 15      0.7894737
 0    4  4      0.2105263
 1    4  8      0.6153846
 1    5  5      0.3846154

推荐答案

试试这个:

mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154

来自 dplyr 小插图:

当您按多个变量分组时,每个摘要都会剥离一个分组级别.这使得逐步汇总数据集变得容易.

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

因此,在summarise之后,在group_by中指定的最后一个分组变量'gear'被剥离.在 mutate 步骤中,数据按剩余的分组变量分组,这里是am".您可以使用 groups 检查每个步骤中的分组.

Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

剥离的结果当然取决于 group_by 调用中分组变量的顺序.您可能希望执行后续的 group_by(am),以使您的代码更加明确.

The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

关于四舍五入和美化,请参考@Tyler Rinker 的好回答.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

这篇关于与 dplyr 的相对频率/比例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆