在tidyr / dplyr中添加零计数行的正确用法 [英] Proper idiom for adding zero count rows in tidyr/dplyr
问题描述
假设我有一些计数数据,如下所示:
Suppose I have some count data that looks like this:
library(tidyr)
library(dplyr)
X.raw <- data.frame(
x = as.factor(c("A", "A", "A", "B", "B", "B")),
y = as.factor(c("i", "ii", "ii", "i", "i", "i")),
z = 1:6)
X.raw
# x y z
# 1 A i 1
# 2 A ii 2
# 3 A ii 3
# 4 B i 4
# 5 B i 5
# 6 B i 6
我会喜欢整理和总结如下:
I'd like to tidy and summarise like this:
X.tidy <- X.raw %>% group_by(x,y) %>% summarise(count=sum(z))
X.tidy
# Source: local data frame [3 x 3]
# Groups: x
#
# x y count
# 1 A i 1
# 2 A ii 5
# 3 B i 15
我知道对于 x == B
和 y == ii
我们观察到计数为零,而不是缺失值。也就是说,现场工作人员实际上在那儿,但是因为没有一个正数,所以没有在原始数据中输入任何行。我可以这样做来明确添加零计数:
I know that for x=="B"
and y=="ii"
we have observed count of zero, rather than a missing value. i.e. the field worker was actually there, but because there wasn't a positive count no row was entered into the raw data. I can add the zero count explicitly by doing this:
X.fill <- X.tidy %>% spread(y, count, fill=0) %>% gather(y, count, -x)
X.fill
# Source: local data frame [4 x 3]
#
# x y count
# 1 A i 1
# 2 B i 15
# 3 A ii 5
# 4 B ii 0
但这似乎有点a回事。
But that seems a little bit of a roundabout way of doing things. Is their a cleaner idiom for this?
只需澄清一下:我的代码已经使用 spread $ c做了我需要做的事情$ c>然后
聚集
,所以我感兴趣的是在内部 tidyr $中找到一条更直接的路线c $ c>和
dplyr
。
Just to clarify: My code already does what I need it to do, using spread
then gather
, so what I'm interested in is finding a more direct route within tidyr
and dplyr
.
推荐答案
自 dplyr 0.8
可以通过在中设置参数
: .drop = FALSE
来实现。 group_by
Since dplyr 0.8
you can do it by setting the parameter .drop = FALSE
in group_by
:
X.tidy <- X.raw %>% group_by(x, y, .drop = FALSE) %>% summarise(count=sum(z))
X.tidy
# # A tibble: 4 x 3
# # Groups: x [2]
# x y count
# <fct> <fct> <int>
# 1 A i 1
# 2 A ii 5
# 3 B i 15
# 4 B ii 0
这篇关于在tidyr / dplyr中添加零计数行的正确用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!