group_by() 到 fill() 没有按预期工作 [英] group_by() into fill() not working as expected

查看:17
本文介绍了group_by() 到 fill() 没有按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 dplyrtidyr 对一些格式不佳的数据执行上次观察结转操作.它不像我预期的那样工作.

I'm trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr and tidyr. It isn't working as I'd expect.

library(dplyr)
library(tidyr)

df <- data.frame(id=c(1,1,2,2,3,3),
                 email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
df2 <- df %>% group_by(id) %>% fill(email)

这导致:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3 joe@email.com
6     3 joe@email.com

我希望它是:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3 NA
6     3 NA

我希望它是后者的原因是因为 group_by 的文档说,group_by 函数采用现有的 tbl 并将其转换为分组的 tbl其中按组"执行操作.本例中的组由id变量决定,下面的操作是fill(email).然而,它显然没有这样做.

The reason I expect it to be the latter is because of group_by's documentation saying, "The group_by function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group"." The group in this case is determined by the id variable, and the following operation is fill(email). However, it's pretty clearly NOT doing that.

在有人问之前,如果字段都是 character 而不是 numericfactor 没有区别.

And before anybody asks, it makes no difference if the fields are both character instead of numeric or factor.

更新@aosmith 在 Github 上指出了这个悬而未决的问题.我要说的是,在该问题得到解决之前,不会有适当的解决方案来解决该问题.其他一切都只是一种解决方法.因此,如果有人成功解决了该问题并将其发布在此处,我很乐意将其标记为解决方案.

UPDATE @aosmith pointed out this open issue on Github. I'm going to say that there won't be a proper solution to this problem until that issue is resolved. Everything else would just be a workaround. So, if somebody makes a successful PR addressing that issue and posts it here, I'd be happy to mark it as the solution.

推荐答案

看起来这个问题已经在 tidyr 的开发版本中修复了.您现在可以使用 tidyr_0.3.1.9000 中的 fill 获得每个 id 的预期结果.

Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3            NA
6     3            NA

这篇关于group_by() 到 fill() 没有按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆