group_by() 到 fill() 没有按预期工作 [英] group_by() into fill() not working as expected
问题描述
我正在尝试使用 dplyr
和 tidyr
对一些格式不佳的数据执行上次观察结转操作.它不像我预期的那样工作.
I'm trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr
and tidyr
. It isn't working as I'd expect.
library(dplyr)
library(tidyr)
df <- data.frame(id=c(1,1,2,2,3,3),
email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
df2 <- df %>% group_by(id) %>% fill(email)
这导致:
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 joe@email.com
6 3 joe@email.com
我希望它是:
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 NA
6 3 NA
我希望它是后者的原因是因为 group_by
的文档说,group_by
函数采用现有的 tbl 并将其转换为分组的 tbl其中按组"执行操作.本例中的组由id
变量决定,下面的操作是fill(email)
.然而,它显然没有这样做.
The reason I expect it to be the latter is because of group_by
's documentation saying, "The group_by
function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group"." The group in this case is determined by the id
variable, and the following operation is fill(email)
. However, it's pretty clearly NOT doing that.
在有人问之前,如果字段都是 character
而不是 numeric
或 factor
没有区别.
And before anybody asks, it makes no difference if the fields are both character
instead of numeric
or factor
.
更新@aosmith 在 Github 上指出了这个悬而未决的问题.我要说的是,在该问题得到解决之前,不会有适当的解决方案来解决该问题.其他一切都只是一种解决方法.因此,如果有人成功解决了该问题并将其发布在此处,我很乐意将其标记为解决方案.
UPDATE @aosmith pointed out this open issue on Github. I'm going to say that there won't be a proper solution to this problem until that issue is resolved. Everything else would just be a workaround. So, if somebody makes a successful PR addressing that issue and posts it here, I'd be happy to mark it as the solution.
推荐答案
看起来这个问题已经在 tidyr 的开发版本中修复了.您现在可以使用 tidyr_0.3.1.9000 中的 fill
获得每个 id 的预期结果.
Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill
from tidyr_0.3.1.9000.
df %>% group_by(id) %>% fill(email)
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 NA
6 3 NA
这篇关于group_by() 到 fill() 没有按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!