使用dplyr计算相对频率与组总数 [英] Compute relative frequencies with group totals using dplyr
本文介绍了使用dplyr计算相对频率与组总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下玩具数据:
data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A",
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA,
-16L))
使用命令:
data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq
我为每个类别中的每个值计算适当的相对频率:
I calculate appropriate relative frequencies for each value in each of the classes:
> data
Var1 Var2 Freq rel_freq
1 A 1 3 0.2727273
2 B 1 3 0.6000000
3 A 2 4 0.3636364
4 B 2 2 0.4000000
5 A 3 4 0.3636364
6 B 3 0 0.0000000
我想知道如何构造等效的 dplyr
管道.下面是我的尝试:
I wonder how to construct equivalent dplyr
pipeline. Pasted below is my attempt:
library(dplyr)
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
mutate(freq = n / sum(n))
我为每个值计算相对频率,但是不幸的是,为每对类别分别计算(而不是组总数):
I compute relative frequencies for each value, but, unfortunately, separately for each pair of classes (instead for group totals):
Source: local data frame [6 x 4]
Groups: value [3]
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.5000000
2 1 B 3 0.5000000
3 2 A 4 0.6666667
4 2 B 2 0.3333333
5 3 A 4 1.0000000
6 3 B 0 0.0000000
推荐答案
您只需要按 class
分组以计算频率,因此删除 value
分组:
You only need to group by class
for computing the frequencies, so remove the value
grouping:
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
group_by(class) %>%
mutate(freq = n / sum(n))
# A tibble: 6 x 4
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.2727273
2 1 B 3 0.6000000
3 2 A 4 0.3636364
4 2 B 2 0.4000000
5 3 A 4 0.3636364
6 3 B 0 0.0000000
这篇关于使用dplyr计算相对频率与组总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文