R,从向量中查找字符串,创建新的 TRUE/FALSE 列 [英] R, find character string from vector, create new TRUE/FALSE columns

查看:32
本文介绍了R,从向量中查找字符串,创建新的 TRUE/FALSE 列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框:

I have a data frame like this:

df<-structure(list(MRN = c("53634", "65708", "72122", "40458", "03935", 
"67473", "20281", "52479", "10261", "40945", "40630", "92295", 
"43505", "80719", "39492", "44720", "70691", "21351", "03457", 
"02182"), Outcome_Diagnosis_1 = c(NA, NA, NA, "Seroma of breast [N64.89]", 
"Breast implant capsular contracture [T85.44XA]; Breast implant capsular contracture [T85.44XA]; Breast implant capsular contracture [T85.44XA]", 
NA, NA, NA, "Acquired breast deformity [N64.89]", NA, NA, NA, 
NA, "Acquired breast deformity [N64.89]", NA, NA, NA, NA, NA, 
NA), Outcome_Diagnosis_2 = c(NA, NA, NA, "Extrusion of breast implant, initial encounter [T85.49XA]; Extrusion of breast implant, initial encounter [T85.49XA]; Extrusion of breast implant, initial encounter [T85.49XA]", 
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Capsular contracture of breast implant, subsequent encounter [T85.44XD]; Capsular contracture of breast implant, subsequent encounter [T85.44XD]; Capsular contracture of breast implant, subsequent encounter [T85.44XD]", 
NA, NA, NA, NA, NA, NA), Outcome_Diagnosis_3 = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Acquired breast deformity [N64.89]; Capsular contracture of breast implant, initial encounter [T85.44XA]; Capsular contracture of breast implant, initial encounter [T85.44XA]; Capsular contracture of breast implant, initial encounter [T85.44XA]", 
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

我有几个这样的向量:

Infection<-c("L76","L00", "L01","L02","L03","L04", "L05","L08")
Hematoma<-c("N64.89","M79.81")
Seroma<- c("L76.34")
Necrosis<- c("N64.1","T86.821")
CapsularContracture<- c("T85.44")
MechanicalComplications<- c("T85", "T85.4", "T85.41", "T85.42", "T85.43", "T85.49")

我想做的是在数据框中创建新列,如果在每一行中找到该向量,则这些列是 TRUE/FALSE.(即使它在该行中多次出现,它也只是 TRUE,即它不需要计数"它们).

What I'd like to do is create new columns in the data frame that are TRUE/FALSE for if that vector was found in each row. (And it would just be TRUE even if it shows up multiple times in that row, i.e. it doesn't need to "count" them).

所以我想要的输出是这样的:

So the output I want would be something like this:

我挣扎并寻求帮助的原因是我真的不知道如何将搜索特定字符串(可能在该列中较长的句子中)和查看多个列结合起来.

The reason I am struggling and came to stack for help is I don't really know how to combine searching for particular strings (that might be within a longer sentence in that column) and looking over multiple columns.

可能很重要的其他信息:

  • 除了这 3 个结果诊断之外,还有更多的列,如果答案查看整行而不考虑多少列,那将会很有用
  • 有时这些代码不够具体,查找Seroma"等实际单词可能会很有用.我想这只是交换引号内的字符的一种情况,对吗?

推荐答案

您可以将向量存储在列表中:

You could store your vectors in a list:

lst <- list(Infection = c("L76","L00", "L01","L02","L03","L04", "L05","L08"),
            Hematoma = c("N64.89","M79.81"),
            Seroma = c("L76.34"),
            Necrosis = c("N64.1","T86.821"),
            CapsularContracture = c("T85.44"),
            MechanicalComplications = c("T85", "T85.4", "T85.41", "T85.42", "T85.43", "T85.49"))

然后,使用 dplyrpurrr 你可以这样做:

And then, using dplyr and purrr you could do:

imap(lst,
     ~ df %>%
      mutate(!!.y := reduce(across(Outcome_Diagnosis_1:Outcome_Diagnosis_3, function(y) grepl(paste(sub("\\.", "", .x), collapse = "|"), sub("\\.", "", y))), `|`))) %>%
 reduce(full_join)

   MRN   Outcome_Diagnos… Outcome_Diagnos… Outcome_Diagnos… Infection Hematoma Seroma Necrosis CapsularContrac…
   <chr> <chr>            <chr>            <chr>            <lgl>     <lgl>    <lgl>  <lgl>    <lgl>           
 1 53634 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 2 65708 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 3 72122 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 4 40458 Seroma of breas… Extrusion of br… <NA>             FALSE     TRUE     FALSE  FALSE    FALSE           
 5 03935 Breast implant … <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    TRUE            
 6 67473 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 7 20281 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 8 52479 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE           
 9 10261 Acquired breast… <NA>             <NA>             FALSE     TRUE     FALSE  FALSE    FALSE           
10 40945 <NA>             <NA>             <NA>             FALSE     FALSE    FALSE  FALSE    FALSE

这篇关于R,从向量中查找字符串,创建新的 TRUE/FALSE 列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆