与部分字符串匹配和contains()时的case_w [英] case_when with partial string match and contains()

查看:61
本文介绍了与部分字符串匹配和contains()时的case_w的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个包含许多列的数据集,这些列分别称为status1,status2等.在这些列中,它说明某人是否是免税的,已完成的,已注册的等.

I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc.

不幸的是,免税额不一致.这是一个示例:

Unfortunately, the exempt inputs are not consistent; here's a sample:

library(dplyr)

problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                  status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                  status2 = c("exempt", "Completed", "Completed", "Pending"),
                  status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

我正在尝试使用case_when()创建具有其最终状态的新列.如果它说完成了,那么它们就完成了.如果它说免除但不说完整,那么它们就是免除的.

I'm trying to use case_when() to make a new column that has their final status. If it ever says completed, then they are completed. If it ever says exempt without saying complete, then they are exempt.

重要的部分是我希望我的代码使用contains("status")或仅针对状态列而不需要全部键入它们的等效项,并且我希望它仅需要部分字符串匹配豁免.

The important part is that I want my code to use contains("status"), or some equivalent that only targets the status columns and doesn't require typing them all, and I want it to only require a partial string match for exempt.

关于在case_when中使用contains的情况,我看到了此示例,但无法将其应用于我的案例:

As for using contains with case_when, I saw this example, but I wasn't able to apply it to my case: mutate with case_when and contains

到目前为止,这是我尝试使用的方法,但是您可以猜到它没有用:

This is what I've tried to use so far, but as you can guess, it has not worked:

library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
  mutate(final= case_when(pmap_chr(select(., contains("status")), ~
    any(c(...) == str_detect(., "Exempt") ~ "Exclude",
               TRUE ~ "Complete"
  ))))

这就是我希望最终产品的外观:

Here's what I want the final product to look like:

solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                   status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                   status2 = c("exempt", "Completed", "Completed", "Pending"),
                   status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
                   final = c("Exclude", "Completed", "Completed", "Exclude")) 

谢谢!

推荐答案

我认为您正在反向进行此操作.将 case_when 放在 pmap_chr 内,而不是相反:

I think you are doing it backwards. Put case_when inside pmap_chr instead of the other way around:

library(dplyr)
library(purrr)
library(stringr)

problem %>%
  mutate(final = pmap_chr(select(., contains("status")), 
                          ~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
                                      TRUE ~ "Completed")))

对于每个 pmap 迭代(问题数据集的每一行),我们想使用 case_when 来检查是否存在字符串豁免. str_detect 中的(?i)使其不区分大小写.这与编写 str_detect(c(...),regex("Exempt",ignore_case = TRUE))

For each pmap iteration (each row of problem dataset), we want to use case_when to check if there exists the string Exempt. (?i) in str_detect makes it case insensitive. This is the same as writing str_detect(c(...), regex("Exempt", ignore_case = TRUE))

输出:

# A tibble: 4 x 5
  person status1   status2   status3     final    
  <chr>  <chr>     <chr>     <chr>       <chr>    
1 Corey  7EXEMPT   exempt    EXEMPTED    Exclude  
2 Sibley Completed Completed Completed   Completed
3 Justin Completed Completed Completed   Completed
4 Ruth   Pending   Pending   ExempT - 14 Exclude

这篇关于与部分字符串匹配和contains()时的case_w的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆