使用dplyr的group_by的na.locf [英] na.locf using group_by from dplyr
问题描述
我正尝试将软件包 zoo
中的 na.locf
与使用 dplyr
。我正在针对这个问题使用第一个解决方案:使用dplyr窗口函数生成尾随值(填写NA值)
I'm trying to use na.locf
from package zoo
with grouped data using dplyr
. I'm using the first solution on this question: Using dplyr window-functions to make trailing values (fill in NA values)
library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
id problem ok
1 A 1 NA
2 A NA 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA NA
当在一个组中所有数据均为NA时,就会发生问题。正如您在问题列中看到的那样,id = B的 na.locf
数据来自另一组:id = A的最后一个数据。
The problem happens when, within a group, all the data is NA. As you can see in the problem column, the na.locf
data for id=B comes from another group: the last data of id=A.
df1 %>% group_by(id) %>% na.locf()
Source: local data frame [6 x 3]
Groups: id [2]
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B 2 5 #problem col is wrong
5 B 2 6 #problem col is wrong
6 B 2 6 #problem col is wrong
这是我的预期结果。 id = B的数据与id = A中的数据无关
This is my expected result. The data for id=B is independent of what is in id=A
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA 6
推荐答案
我们需要在其中使用 na.locf
mutate_all
作为 na.locf
可以直接应用于数据集。即使按 id分组,通过在完整数据集上应用 na.locf
也不按行为跟踪任何组
We need to use na.locf
within mutate_all
as na.locf
can be applied directly on the dataset. Eventhough it is grouped by 'id', applying na.locf
by applying on the full dataset is not following any group by behavior
df1 %>%
group_by(id) %>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
# id problem ok
# <fctr> <dbl> <dbl>
#1 A 1 NA
#2 A 1 3
#3 A 2 4
#4 B NA 5
#5 B NA 6
#6 B NA 6
这篇关于使用dplyr的group_by的na.locf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!