根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA [英] Replace multiple values in a dataframe with NA based on conditions given in another dataframe in R
本文介绍了根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用R,我想根据另一个数据框中包含的条件将数据框中的数值替换为NA:
# An example data frame with numeric values I want to
# change to `NA` based on values given in another data frame.
df1 <- tibble::tribble(
~col_1, ~col_2, ~col_3, ~group, ~subgroup,
1, 3, 5, 'A', 'p',
6, 8, 5, 'A', 'q',
5, 3, 3, 'B', 'p',
1, 7, 7, 'B', 'q'
)
# A second data frame containing conditions
# to be used for subsetting the first data frame.
df2 <- tibble::tribble(
~group, ~subgroup, ~cols,
'A', 'q', 'col_1',
'A', 'q', 'col_3',
'B', 'p', 'col_2',
'B', 'p', 'col_3'
)
# My problematic approach to subsetting df1 and replacing
# values with `NA` based on the conditions given in df2.
df1[df1$group %in% unique(df2$group) &
df1$subgroup %in% unique(df2$subgroup),
unique(df2$cols)] <- NA
# The incorrect result of my approach.
print(df1)
# A tibble: 4 × 5
col_1 col_2 col_3 group subgroup
<dbl> <dbl> <dbl> <chr> <chr>
1 NA NA NA A p
2 NA NA NA A q
3 NA NA NA B p
4 NA NA NA B q
由reprex package(v2.0.1)于2021-09-20创建
我的策略是使用df1观测值与df2匹配的真实索引子集df1,并使用]<-NA
将这些观测值替换为NA。但是,我的方法选择了所有观测值,而不是仅替换df2中指示的观测值的预期结果。
如何在没有手动替换的情况下以功能/编程方式执行此操作?此示例数据集足够小,可以对我要替换的每个值使用]<-
方法,但我想在更大、更复杂的数据集上执行此操作。
解决方案和注意事项:@Ronak和@akrun提供的两个解决方案都适用于本问题中的示例数据集。然而,在我的实际数据集中发现罕见的子组和组值重复的情况后,我发现只有@akrun的解决方案有效。下面,我添加了另一个示例,该示例重新创建了我在实际数据中观察到的罕见情况,并添加了@Ronak对解决方案的修改,使其适用于这些重复。
# Unique numeric observations were added
# in rows 1 and 2 with group and subgroup
# values that are duplicated with existing
# group and subgroup values.
df1 <- tibble::tribble(
~col_1, ~col_2, ~col_3, ~group, ~subgroup,
7, 4, 9, "A", "p",
1, 3, 5, "A", "p",
6, 8, 5, "A", "q",
5, 3, 3, "B", "p",
1, 7, 7, "B", "q"
)
# Conditions were added in rows 1 and 2
# to indicate which values to replace
# in df1 with NA.
df2 <- tibble::tribble(
~group, ~subgroup, ~cols,
"A", "p", "col_1",
"A", "p", "col_2",
"A", "q", "col_1",
"A", "q", "col_3",
"B", "p", "col_2",
"B", "p", "col_3"
)
# Modifications of @Ronak's solution
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)
key1 <- lapply(
setNames(names(df1)[grep("col_\d", x = names(df1))], 1:3),
function(x) {
paste(x, df1$group, df1$subgroup)
}
)
key2 <- with(df2, paste(cols, group, subgroup))
indices <- lapply(
key1,
function(x) {
which(x %in% key2)
}
)
indices <- indices[sapply(indices, function(x) length(x) > 0)]
selection <- lapply(
1:length(indices),
function(x) {
cbind(indices[[x]], as.numeric(names(indices)[x]))
}
)
selection <- do.call(rbind, selection)
df1[selection] <- NA
df1
# col_1 col_2 col_3 group subgroup
# 1 NA NA 9 A p
# 2 NA NA 5 A p
# 3 NA 8 NA A q
# 4 5 NA NA B p
# 5 1 7 7 B q
这里有一种方法可以指定推荐答案,即循环第一个数据集(‘df1’)中‘column’列,通过‘group’、‘subgroup’和相应的列名()创建单个字符串向量,检查这些元素是否是要创建的‘df2’的d行在replace
中使用该选项将这些元素替换为NA
library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>%
mutate(across(starts_with('col'),
~ replace(., str_c(group, subgroup, cur_column()) %in%
invoke(str_c, c(df2, sep = '')), NA) ))
-输出
df1
# A tibble: 4 x 5
col_1 col_2 col_3 group subgroup
<dbl> <dbl> <dbl> <chr> <chr>
1 1 3 5 A p
2 NA 8 NA A q
3 5 NA NA B p
4 1 7 7 B q
这篇关于根据R中另一个数据帧中给出的条件,将一个数据帧中的多个值替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文