dplyr总结:等效于“.drop = FALSE”保持输出零长度的组 [英] dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

查看:90
本文介绍了dplyr总结:等效于“.drop = FALSE”保持输出零长度的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用总结 plyr ddply 功能,空类别默认删除。您可以通过添加 .drop = FALSE 来更改此行为。但是,当使用总结 dplyr 时,这不起作用。有没有另一种方法可以在结果中保留空类?



这是一个假数据的例子。

  library(dplyr)

df = data.frame(a = rep(1:3,4),b = rep(1:2,6))

#现在在df $ a
df $ b = factor(df $ b,levels = 1:3)中没有相应值的df $ b添加一个额外的级别

#总结与plyr,保持类别为零
plyr :: ddply(df,b,总结,count_a =长度(a),.drop = FALSE)

b count_a
1 1 6
2 2 6
3 3 0

#现在尝试使用dplyr
df%。%
group_by(b)%。%
总结(count_a =长度(a),.drop = FALSE)

b count_a .drop
1 1 6 FALSE
2 2 6 FALSE

不完全是我所希望的。是否有一个 dplyr 方法来实现与 .drop = FALSE plyr

解决方案

问题仍然是开放的,但在此期间,特别是因为您的数据已经被考虑,您可以使用tidyr中的完成来获取您可能正在寻找的内容:

  library(tidyr)
df%>%
group_by(b)%>%
总汇(count_a =长度(a))%>%
完成(b)
#来源:本地数据框架[3 x 2]

#b count_a
#(fctr)(int)
#1 1 6
#2 2 6
#3 3 NA

如果你想要替换值为零,则需要使用填写指定:

  df%>%
group_by(b)%>%
summaryize(count_a = length(a))%>%
complete(b,fill = list(count_a = ))
#来源:本地数据框架[3 x 2]

# b count_a
#(fctr)(dbl)
#1 1 6
#2 2 6
#3 3 0
pre>

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories in the result?

Here's an example with fake data.

library(dplyr)

df = data.frame(a=rep(1:3,4), b=rep(1:2,6))

# Now add an extra level to df$b that has no corresponding value in df$a
df$b = factor(df$b, levels=1:3)

# Summarise with plyr, keeping categories with a count of zero
plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE)

  b    count_a
1 1    6
2 2    6
3 3    0

# Now try it with dplyr
df %.%
  group_by(b) %.%
  summarise(count_a=length(a), .drop=FALSE)

  b     count_a .drop
1 1     6       FALSE
2 2     6       FALSE

Not exactly what I was hoping for. Is there a dplyr method for achieving the same result as .drop=FALSE in plyr?

解决方案

The issue is still open, but in the meantime, especially since your data are already factored, you can use complete from "tidyr" to get what you might be looking for:

library(tidyr)
df %>%
  group_by(b) %>%
  summarise(count_a=length(a)) %>%
  complete(b)
# Source: local data frame [3 x 2]
# 
#        b count_a
#   (fctr)   (int)
# 1      1       6
# 2      2       6
# 3      3      NA

If you wanted the replacement value to be zero, you need to specify that with fill:

df %>%
  group_by(b) %>%
  summarise(count_a=length(a)) %>%
  complete(b, fill = list(count_a = 0))
# Source: local data frame [3 x 2]
# 
#        b count_a
#   (fctr)   (dbl)
# 1      1       6
# 2      2       6
# 3      3       0

这篇关于dplyr总结:等效于“.drop = FALSE”保持输出零长度的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆