使用dplyr计算相对频率与组总数 [英] Compute relative frequencies with group totals using dplyr

查看:43
本文介绍了使用dplyr计算相对频率与组总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下玩具数据:

data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", 
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, 
-16L))

使用命令:

data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq

我为每个类别中的每个值计算适当的相对频率:

I calculate appropriate relative frequencies for each value in each of the classes:

> data
  Var1 Var2 Freq  rel_freq
1    A    1    3 0.2727273
2    B    1    3 0.6000000
3    A    2    4 0.3636364
4    B    2    2 0.4000000
5    A    3    4 0.3636364
6    B    3    0 0.0000000

我想知道如何构造等效的 dplyr 管道.下面是我的尝试:

I wonder how to construct equivalent dplyr pipeline. Pasted below is my attempt:

library(dplyr)
data %>%
  group_by(value, class) %>%
  summarise(n = n()) %>%
  complete(class, fill = list(n = 0)) %>%
  mutate(freq = n / sum(n))

我为每个值计算相对频率,但是不幸的是,为每对类别分别计算(而不是组总数):

I compute relative frequencies for each value, but, unfortunately, separately for each pair of classes (instead for group totals):

Source: local data frame [6 x 4]
Groups: value [3]

  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.5000000
2     1      B     3 0.5000000
3     2      A     4 0.6666667
4     2      B     2 0.3333333
5     3      A     4 1.0000000
6     3      B     0 0.0000000

推荐答案

您只需要按 class 分组以计算频率,因此删除 value 分组:

You only need to group by class for computing the frequencies, so remove the value grouping:

data %>%
    group_by(value, class) %>%
    summarise(n = n()) %>%
    complete(class, fill = list(n = 0)) %>%
    group_by(class) %>%
    mutate(freq = n / sum(n))
# A tibble: 6 x 4
  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.2727273
2     1      B     3 0.6000000
3     2      A     4 0.3636364
4     2      B     2 0.4000000
5     3      A     4 0.3636364
6     3      B     0 0.0000000

这篇关于使用dplyr计算相对频率与组总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆