与部分字符串匹配和contains()时的case_w [英] case_when with partial string match and contains()
问题描述
我正在处理一个包含许多列的数据集,这些列分别称为status1,status2等.在这些列中,它说明某人是否是免税的,已完成的,已注册的等.
I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc.
不幸的是,免税额不一致.这是一个示例:
Unfortunately, the exempt inputs are not consistent; here's a sample:
library(dplyr)
problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))
我正在尝试使用case_when()创建具有其最终状态的新列.如果它说完成了,那么它们就完成了.如果它说免除但不说完整,那么它们就是免除的.
I'm trying to use case_when() to make a new column that has their final status. If it ever says completed, then they are completed. If it ever says exempt without saying complete, then they are exempt.
重要的部分是我希望我的代码使用contains("status")或仅针对状态列而不需要全部键入它们的等效项,并且我希望它仅需要部分字符串匹配豁免.
The important part is that I want my code to use contains("status"), or some equivalent that only targets the status columns and doesn't require typing them all, and I want it to only require a partial string match for exempt.
关于在case_when中使用contains的情况,我看到了此示例,但无法将其应用于我的案例:
As for using contains with case_when, I saw this example, but I wasn't able to apply it to my case: mutate with case_when and contains
到目前为止,这是我尝试使用的方法,但是您可以猜到它没有用:
This is what I've tried to use so far, but as you can guess, it has not worked:
library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
mutate(final= case_when(pmap_chr(select(., contains("status")), ~
any(c(...) == str_detect(., "Exempt") ~ "Exclude",
TRUE ~ "Complete"
))))
这就是我希望最终产品的外观:
Here's what I want the final product to look like:
solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
final = c("Exclude", "Completed", "Completed", "Exclude"))
谢谢!
推荐答案
我认为您正在反向进行此操作.将 case_when
放在 pmap_chr
内,而不是相反:
I think you are doing it backwards. Put case_when
inside pmap_chr
instead of the other way around:
library(dplyr)
library(purrr)
library(stringr)
problem %>%
mutate(final = pmap_chr(select(., contains("status")),
~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
TRUE ~ "Completed")))
对于每个 pmap
迭代(问题
数据集的每一行),我们想使用 case_when
来检查是否存在字符串豁免
. str_detect
中的(?i)
使其不区分大小写.这与编写 str_detect(c(...),regex("Exempt",ignore_case = TRUE))
For each pmap
iteration (each row of problem
dataset), we want to use case_when
to check if there exists the string Exempt
. (?i)
in str_detect
makes it case insensitive. This is the same as writing str_detect(c(...), regex("Exempt", ignore_case = TRUE))
输出:
# A tibble: 4 x 5
person status1 status2 status3 final
<chr> <chr> <chr> <chr> <chr>
1 Corey 7EXEMPT exempt EXEMPTED Exclude
2 Sibley Completed Completed Completed Completed
3 Justin Completed Completed Completed Completed
4 Ruth Pending Pending ExempT - 14 Exclude
这篇关于与部分字符串匹配和contains()时的case_w的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!