使用 pmap 将不同的正则表达式应用于 tibble 中的不同变量? [英] Using pmap to apply different regular expressions to different variables in a tibble?
问题描述
我正在尝试将不同的正则表达式应用于小标题中的不同变量.例如,我制作了一个 tibble 列表 1) 我要修改的变量名称,2) 我要匹配的正则表达式,以及 3) 替换字符串.我想将正则表达式/替换应用于不同数据框中的变量.
I'm trying to apply different regular expressions to different variables in a tibble. For example, I've made a tibble listing 1) the variable name I want to modify, 2) the regex I want to match, and 3) the replacement string. I'd like to apply the regex/replacement to the variable in a different data frame.
所以我的配置"标题如下所示:
So my "configuration" tibble looks like this:
test_config <- dplyr::tibble(
string_col = c("col1", "col2", "col3", "col4"),
pattern = c("^\\.$", "^NA$", "^NULL$", "^$"),
replacement = c("","","", "")
)
我想将此应用于目标小标题:
I'd like to apply this to a target tibble:
test_target <- dplyr::tibble(
col1 = c("Foo", "bar", ".", "NA", "NULL"),
col2 = c("Foo", "bar", ".", "NA", "NULL"),
col3 = c("Foo", "bar", ".", "NA", "NULL"),
col4 = c("NULL", "NA", "Foo", ".", "bar")
)
所以目标是在 test_target 的每一列/变量中用空字符串替换不同的字符串.
So the goal is to replace a different string with an empty string in each column/variable of the test_target.
结果应该是这样的:
result <- dplyr::tibble(
col1 = c("Foo", "bar", "", "NA", "NULL"),
col2 = c("Foo", "bar", ".", "", "NULL"),
col3 = c("Foo", "bar", ".", "NA", ""),
col4 = c("NULL", "NA", "Foo", ".", "bar")
)
我可以用 for 循环做我想做的事,就像这样:
I can do what I want with a for loop, like this:
for (i in seq(nrow(test_config))) {
test_target <- dplyr::mutate_at(test_target,
.vars = dplyr::vars(
tidyselect::matches(test_config$string_col[[i]])),
.funs = dplyr::funs(
stringr::str_replace_all(
., test_config$pattern[[i]],
test_config$replacement[[i]]))
)
}
相反,有没有更整洁的方式来做我想做的事?到目前为止,我认为 purrr::pmap
是完成这项工作的工具,我制作了一个函数,它接受一个数据框、变量名、正则表达式和替换值,并返回数据框修改了单个变量.它的行为符合预期:
Instead, is there a more tidy way to do what I want?
So far, thinking that purrr::pmap
was the tool for the job, I've made a function that takes a data frame, variable name, regular expression, and replacement value and returns the data frame with a single variable modified. It behaves as expected:
testFun <- function(df, colName, regex, repVal){
colName <- dplyr::enquo(colName)
df <- dplyr::mutate_at(df,
.vars = dplyr::vars(
tidyselect::matches(!!colName)),
.funs = dplyr::funs(
stringr::str_replace_all(., regex, repVal))
)
}
# try with example
out <- testFun(test_target,
test_config$string_col[[1]],
test_config$pattern[[1]],
"")
然而,当我尝试在 pmap 中使用该函数时,我遇到了几个问题:1) 有没有比这更好的方法来为 pmap 调用构建列表?
However, when I try to use that function with pmap, I run into a couple problems: 1) is there a better way to build the list for the pmap call than this?
purrr::pmap(
list(test_target,
test_config$string_col,
test_config$pattern,
test_config$replacement),
testFun
)
2) 当我调用 pmap 时,出现错误:
2) When I call pmap, I get an error:
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "character"
Called from: tbl_vars(tbl)
你们中的任何人都可以建议一种使用 pmap 来做我想做的事情的方法,或者是否有不同或更好的 tidyverse 方法来解决这个问题?
Can any of you suggest a way to use pmap to do what I want, or is there a different or better tidyverse approach to the problem?
谢谢!
推荐答案
你不需要创建函数(你的函数实际上是问题的根源):你可以直接使用str_replace_all
.
You don't need to create a function (your function is actually the source of the problem): you can use str_replace_all
directly.
pmap_dfr(
list(test_target,
test_config$pattern,
test_config$replacement),
str_replace_all
)
# A tibble: 5 x 4
col1 col2 col3 col4
<chr> <chr> <chr> <chr>
1 Foo Foo Foo NULL
2 bar bar bar NA
3 "" . . Foo
4 NA "" NA .
5 NULL NULL "" bar
这篇关于使用 pmap 将不同的正则表达式应用于 tibble 中的不同变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!