R删除仅具有NA的组 [英] R remove groups with only NAs
问题描述
我有一个类似于以下结构生成的数据框:
I have a dataframe similar to the one generated by the following structure:
library(dplyr)
df1 <- expand.grid(region = c("USA", "EUR", "World"),
time = c(2000, 2005, 2010, 2015, 2020),
scenario = c("policy1", "policy2"),
variable = c("foo", "bar"))
df2 <- expand.grid(region = c("USA", "EUR", "World"),
time = seq(2000, 2020, 1),
scenario = c("policy1", "policy2"),
variable = c("foo", "bar"))
df2 <- filter(df2, !(time %in% c(2000, 2005, 2010, 2015, 2020)))
df1$value <- rnorm(dim(df1)[1], 1.5, 1)
df1[df1 < 0] <- NA
df2$value <- NA
df1[df1$region == "World" & df1$variable == "foo", "value"] <- NA
df <- rbind(df1, df2)
rm(df1, df2)
df <- arrange(df, region, scenario, variable, time)
df
包含两种NA类型。对于区域和变量(World / foo)的一种组合,根本没有数据。对于所有其他组合,我们具有除2000、2005、2010、2015、2020年以外的所有年份的资产净值。
df
contains two "types" of NA. For one combination of region and variable (World/foo), there is no data at all. For all other combinations, we have NAs for all years except 2000, 2005, 2010, 2015, 2020.
我需要一个过滤器来删除区域和变量组合确实只包含NA,但保留仅包含几个NA的那些组合。背景是我想通过组合 dplyr
和 zoo $ c $的功能来应用线性插值来计算后者的缺失值c> -package(用于插值),方法如下:
I need a filter that removes the combinations of regions and variable that do only contain NAs, but keeps those combinations that only contain a few NAs. Background is that I want to apply a linear interpolation to compute the missing values for the latter by combining dplyr
and functionality from the zoo
-package (for the interpolation) using something like this:
df <- group_by(df, region, scenario, variable, time) %>%
mutate(value = zoo::na.approx(value)) %>% ungroup()
仅包含NA的组将导致 na.approx
返回错误,因为它仅适用于NA。
The group containing only NAs leads to na.approx
returning an error since it cannot function only with NAs.
推荐答案
仅保留地区
和变量的组合
在值中至少有1个非NA条目
您可以使用:
To keep only combinations of region
and variable
that have at least 1 non-NA entry in value
you can use:
df %>% group_by(region, variable) %>% filter(any(!is.na(value)))
或等效地:
df %>% group_by(region, variable) %>% filter(!all(is.na(value)))
使用data.table可以使用:
And with data.table you could use:
library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD, by = .(region, variable)]
以R为基数的方法可能是:
An approach in base R could be:
df_split <- split(df, interaction(df$region, df$scenario, df$variable))
do.call(rbind.data.frame, df_split[sapply(df_split, function(x) any(!is.na(x$value)))])
这篇关于R删除仅具有NA的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!