根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA [英] Replace multiple values in a dataframe with NA based on conditions given in another dataframe in R

查看:18
本文介绍了根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用R,我想根据另一个数据框中包含的条件将数据框中的数值替换为NA:

# An example data frame with numeric values I want to  
# change to `NA` based on values given in another data frame.
df1 <- tibble::tribble(
  ~col_1, ~col_2, ~col_3, ~group, ~subgroup,
  1,      3,      5,     'A',    'p',
  6,      8,      5,     'A',    'q',
  5,      3,      3,     'B',    'p',
  1,      7,      7,     'B',    'q'
)

# A second data frame containing conditions  
# to be used for subsetting the first data frame.
df2 <- tibble::tribble(
  ~group, ~subgroup, ~cols,
  'A',    'q',       'col_1',
  'A',    'q',       'col_3',
  'B',    'p',       'col_2', 
  'B',    'p',       'col_3'
)

# My problematic approach to subsetting df1 and replacing 
# values with `NA` based on the conditions given in df2.
df1[df1$group %in% unique(df2$group) & 
    df1$subgroup %in% unique(df2$subgroup), 
    unique(df2$cols)] <- NA

# The incorrect result of my approach.
print(df1)
# A tibble: 4 × 5
  col_1 col_2 col_3 group subgroup
  <dbl> <dbl> <dbl> <chr> <chr>   
1    NA    NA    NA A     p       
2    NA    NA    NA A     q       
3    NA    NA    NA B     p       
4    NA    NA    NA B     q       

reprex package(v2.0.1)于2021-09-20创建

我的策略是使用df1观测值与df2匹配的真实索引子集df1,并使用]<-NA将这些观测值替换为NA。但是,我的方法选择了所有观测值,而不是仅替换df2中指示的观测值的预期结果。

如何在没有手动替换的情况下以功能/编程方式执行此操作?此示例数据集足够小,可以对我要替换的每个值使用]<-方法,但我想在更大、更复杂的数据集上执行此操作。

解决方案和注意事项:@Ronak和@akrun提供的两个解决方案都适用于本问题中的示例数据集。然而,在我的实际数据集中发现罕见的子组和组值重复的情况后,我发现只有@akrun的解决方案有效。下面,我添加了另一个示例,该示例重新创建了我在实际数据中观察到的罕见情况,并添加了@Ronak对解决方案的修改,使其适用于这些重复。

# Unique numeric observations were added
# in rows 1 and 2 with group and subgroup
# values that are duplicated with existing
# group and subgroup values.
df1 <- tibble::tribble(
  ~col_1, ~col_2, ~col_3, ~group, ~subgroup,
  7, 4, 9, "A", "p",
  1, 3, 5, "A", "p",
  6, 8, 5, "A", "q",
  5, 3, 3, "B", "p",
  1, 7, 7, "B", "q"
)

# Conditions were added in rows 1 and 2
# to indicate which values to replace
# in df1 with NA.
df2 <- tibble::tribble(
  ~group, ~subgroup, ~cols,
  "A",    "p",       "col_1",
  "A",    "p",       "col_2",
  "A",    "q",       "col_1",
  "A",    "q",       "col_3",
  "B",    "p",       "col_2",
  "B",    "p",       "col_3"
)

# Modifications of @Ronak's solution
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)

key1 <- lapply(
  setNames(names(df1)[grep("col_\d", x = names(df1))], 1:3),
  function(x) {
    paste(x, df1$group, df1$subgroup)
  }
)

key2 <- with(df2, paste(cols, group, subgroup))

indices <- lapply(
  key1,
  function(x) {
    which(x %in% key2)
  }
)

indices <- indices[sapply(indices, function(x) length(x) > 0)]

selection <- lapply(
  1:length(indices),
  function(x) {
    cbind(indices[[x]], as.numeric(names(indices)[x]))
  }
)

selection <- do.call(rbind, selection)
df1[selection] <- NA
df1
#   col_1 col_2 col_3 group subgroup
# 1    NA    NA     9     A        p
# 2    NA    NA     5     A        p
# 3    NA     8    NA     A        q
# 4     5    NA    NA     B        p
# 5     1     7     7     B        q
这里有一种方法可以指定推荐答案,即循环第一个数据集(‘df1’)中‘column’列,通过‘group’、‘subgroup’和相应的列名()创建单个字符串向量,检查这些元素是否是要创建的‘df2’的d行在replace中使用该选项将这些元素替换为NA

library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>% 
   mutate(across(starts_with('col'), 
   ~ replace(., str_c(group, subgroup, cur_column()) %in%  
        invoke(str_c, c(df2, sep = '')), NA) ))

-输出

df1
# A tibble: 4 x 5
  col_1 col_2 col_3 group subgroup
  <dbl> <dbl> <dbl> <chr> <chr>   
1     1     3     5 A     p       
2    NA     8    NA A     q       
3     5    NA    NA B     p       
4     1     7     7 B     q       

这篇关于根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆