R删除仅具有NA的组 [英] R remove groups with only NAs

查看：75 发布时间：2020/10/16 21:36:14 r dataframe dplyr zoo

本文介绍了R删除仅具有NA的组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似于以下结构生成的数据框：

I have a dataframe similar to the one generated by the following structure:

library(dplyr)

df1 <- expand.grid(region   = c("USA", "EUR", "World"),
                  time     = c(2000, 2005, 2010, 2015, 2020),
                  scenario = c("policy1", "policy2"),
                  variable = c("foo", "bar"))

df2 <- expand.grid(region   = c("USA", "EUR", "World"),
                  time     = seq(2000, 2020, 1),
                  scenario = c("policy1", "policy2"),
                  variable = c("foo", "bar"))

df2 <- filter(df2, !(time %in% c(2000, 2005, 2010, 2015, 2020)))

df1$value <- rnorm(dim(df1)[1], 1.5, 1)
df1[df1 < 0] <- NA
df2$value <- NA

df1[df1$region == "World" & df1$variable == "foo", "value"] <- NA

df <- rbind(df1, df2)

rm(df1, df2)

df <- arrange(df, region, scenario, variable, time)

df 包含两种NA类型。对于区域和变量（World / foo）的一种组合，根本没有数据。对于所有其他组合，我们具有除2000、2005、2010、2015、2020年以外的所有年份的资产净值。

df contains two "types" of NA. For one combination of region and variable (World/foo), there is no data at all. For all other combinations, we have NAs for all years except 2000, 2005, 2010, 2015, 2020.

我需要一个过滤器来删除区域和变量组合确实只包含NA，但保留仅包含几个NA的那些组合。背景是我想通过组合 dplyr 和 zoo -package（用于插值），方法如下：


I need a filter that removes the combinations of regions and variable that do only contain NAs, but keeps those combinations that only contain a few NAs. Background is that I want to apply a linear interpolation to compute the missing values for the latter by combining dplyr and functionality from the zoo-package (for the interpolation) using something like this:
df <- group_by(df, region, scenario, variable, time) %>%
      mutate(value = zoo::na.approx(value)) %>% ungroup()

仅包含NA的组将导致 na.approx 返回错误，因为它仅适用于NA。
The group containing only NAs leads to na.approx returning an error since it cannot function only with NAs.
推荐答案
仅保留地区和变量的组合在值中至少有1个非NA条目您可以使用：
To keep only combinations of region and variable that have at least 1 non-NA entry in value you can use:
df %>% group_by(region, variable) %>% filter(any(!is.na(value)))

或等效地：
df %>% group_by(region, variable) %>% filter(!all(is.na(value)))

使用data.table可以使用：
And with data.table you could use:
library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD, by = .(region, variable)]

以R为基数的方法可能是：
An approach in base R could be:
df_split <- split(df, interaction(df$region, df$scenario, df$variable))
do.call(rbind.data.frame, df_split[sapply(df_split, function(x) any(!is.na(x$value)))])


                        这篇关于R删除仅具有NA的组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R删除仅具有NA的组 [英] R remove groups with only NAs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R删除仅具有NA的组 [英] R remove groups with only NAs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭