Tidyverse:使用最新的非NA值替换NA * *使用Tidyverse工具* [英] Tidyverse: Replacing NAs with latest non-NA values *using tidyverse tools*
问题描述
在使用 zoo ::
和 data.table ::
之前,我的问题已得到解答.我很好奇tidyverse/dplyr的最佳解决方案是什么.
My question has been answered before using zoo::
and data.table::
; I'm curious as to what the best solution with tidyverse/dplyr would be.
以前的答案(非tidyverse): R中的前向和后向填充数据帧用最新的非NA值替换NA
Previous answers (non-tidyverse): Forward and backward fill data frame in R Replacing NAs with latest non-NA value
我的数据看起来像这样,每个国家(美国,澳大利亚)最早的两年(2015年,2016年)缺少数据(底部输入数据的代码):
My data looks like this, where the earliest two years (2015, 2016) in each country (usa, aus) have missing data (code for data input at the bottom):
#> country year value
#> 1 usa 2015 NA
#> 2 usa 2016 NA
#> 3 usa 2017 100
#> 4 usa 2018 NA
#> 5 aus 2015 NA
#> 6 aus 2016 NA
#> 7 aus 2017 50
#> 8 aus 2018 60
我想用2017年的可用值填充每个国家/地区中的缺失值.
I would like to fill the missing values, within each country, with the value available in 2017.
我希望该填充仅适用于2017年之前的年份-因此2018年的NA不应该使用任何填充.应该保持不适用.
I would like that fill to only be for the years prior to 2017--so an NA in 2018 should not be filled in by anything. It should remain NA.
所以我想要的输出是:
#> country year value
#> 1 usa 2015 100
#> 2 usa 2016 100
#> 3 usa 2017 100
#> 4 usa 2018 NA
#> 5 aus 2015 50
#> 6 aus 2016 50
#> 7 aus 2017 50
#> 8 aus 2018 60
我尝试了 group_by(country)
,然后怀疑我打算使用 coalesce()
,但是我通常在整个范围内使用 coalesce
向量,而不是沿着向量.
I tried group_by(country)
and then I suspect I'm meant to use coalesce()
, but I normally use coalesce
across vectors, not along them.
library(tidyverse)
df %>% group_by(country) %>%
使用tidyverse工具最简单的方法是什么?
What's the easiest way to do this using tidyverse tools?
#install.packages("datapasta")
df <- data.frame(
stringsAsFactors = FALSE,
country = c("usa", "usa", "usa", "usa", "aus", "aus", "aus", "aus"),
year = c(2015L, 2016L, 2017L, 2018L, 2015L, 2016L, 2017L, 2018L),
value = c(NA, NA, 100L, NA, NA, NA, 50L, 60L)
)
df
推荐答案
我们可以在2017年之前 NA
替换 NA
,并在2017年中为每个国家
.
We can replace
the NA
s before 2017 with value available in 2017 year for each country
.
library(dplyr)
df %>%
group_by(country) %>%
mutate(value = replace(value, is.na(value) & year < 2017, value[year == 2017]))
#Similarly with ifelse
#mutate(value = ifelse(is.na(value) & year < 2017, value[year == 2017], value))
# country year value
# <chr> <int> <int>
#1 usa 2015 100
#2 usa 2016 100
#3 usa 2017 100
#4 usa 2018 NA
#5 aus 2015 50
#6 aus 2016 50
#7 aus 2017 50
#8 aus 2018 60
这篇关于Tidyverse:使用最新的非NA值替换NA * *使用Tidyverse工具*的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!