R 删除只有 NA 的组 [英] R remove groups with only NAs

查看:15
本文介绍了R 删除只有 NA 的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于以下结构生成的数据帧:

I have a dataframe similar to the one generated by the following structure:

library(dplyr)

df1 <- expand.grid(region   = c("USA", "EUR", "World"),
                  time     = c(2000, 2005, 2010, 2015, 2020),
                  scenario = c("policy1", "policy2"),
                  variable = c("foo", "bar"))

df2 <- expand.grid(region   = c("USA", "EUR", "World"),
                  time     = seq(2000, 2020, 1),
                  scenario = c("policy1", "policy2"),
                  variable = c("foo", "bar"))

df2 <- filter(df2, !(time %in% c(2000, 2005, 2010, 2015, 2020)))

df1$value <- rnorm(dim(df1)[1], 1.5, 1)
df1[df1 < 0] <- NA
df2$value <- NA

df1[df1$region == "World" & df1$variable == "foo", "value"] <- NA

df <- rbind(df1, df2)

rm(df1, df2)

df <- arrange(df, region, scenario, variable, time)

df 包含两种类型"的 NA.对于区域和变量的一种组合(World/foo),根本没有数据.对于所有其他组合,我们有除 2000、2005、2010、2015、2020 之外的所有年份的 NA.

df contains two "types" of NA. For one combination of region and variable (World/foo), there is no data at all. For all other combinations, we have NAs for all years except 2000, 2005, 2010, 2015, 2020.

我需要一个过滤器来删除只包含 NA 的区域和变量的组合,但保留那些只包含几个 NA 的组合.背景是我想应用线性插值来计算后者的缺失值,方法是将 dplyrzoo 包中的功能(用于插值)结合使用这个:

I need a filter that removes the combinations of regions and variable that do only contain NAs, but keeps those combinations that only contain a few NAs. Background is that I want to apply a linear interpolation to compute the missing values for the latter by combining dplyr and functionality from the zoo-package (for the interpolation) using something like this:

df <- group_by(df, region, scenario, variable, time) %>%
      mutate(value = zoo::na.approx(value)) %>% ungroup()

仅包含 NA 的组导致 na.approx 返回错误,因为它不能仅与 NA 一起使用.

The group containing only NAs leads to na.approx returning an error since it cannot function only with NAs.

推荐答案

只保留 regionvariable 的组合,这些组合在 <代码>值你可以使用:

To keep only combinations of region and variable that have at least 1 non-NA entry in value you can use:

df %>% group_by(region, variable) %>% filter(any(!is.na(value)))

或等效地:

df %>% group_by(region, variable) %>% filter(!all(is.na(value)))

你可以使用 data.table:

And with data.table you could use:

library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD, by = .(region, variable)]

基于 R 的方法可能是:

An approach in base R could be:

df_split <- split(df, interaction(df$region, df$scenario, df$variable))
do.call(rbind.data.frame, df_split[sapply(df_split, function(x) any(!is.na(x$value)))])

这篇关于R 删除只有 NA 的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆