R,从向量中查找字符串,创建新的 TRUE/FALSE 列 [英] R, find character string from vector, create new TRUE/FALSE columns
问题描述
我有一个这样的数据框:
I have a data frame like this:
df<-structure(list(MRN = c("53634", "65708", "72122", "40458", "03935",
"67473", "20281", "52479", "10261", "40945", "40630", "92295",
"43505", "80719", "39492", "44720", "70691", "21351", "03457",
"02182"), Outcome_Diagnosis_1 = c(NA, NA, NA, "Seroma of breast [N64.89]",
"Breast implant capsular contracture [T85.44XA]; Breast implant capsular contracture [T85.44XA]; Breast implant capsular contracture [T85.44XA]",
NA, NA, NA, "Acquired breast deformity [N64.89]", NA, NA, NA,
NA, "Acquired breast deformity [N64.89]", NA, NA, NA, NA, NA,
NA), Outcome_Diagnosis_2 = c(NA, NA, NA, "Extrusion of breast implant, initial encounter [T85.49XA]; Extrusion of breast implant, initial encounter [T85.49XA]; Extrusion of breast implant, initial encounter [T85.49XA]",
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Capsular contracture of breast implant, subsequent encounter [T85.44XD]; Capsular contracture of breast implant, subsequent encounter [T85.44XD]; Capsular contracture of breast implant, subsequent encounter [T85.44XD]",
NA, NA, NA, NA, NA, NA), Outcome_Diagnosis_3 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Acquired breast deformity [N64.89]; Capsular contracture of breast implant, initial encounter [T85.44XA]; Capsular contracture of breast implant, initial encounter [T85.44XA]; Capsular contracture of breast implant, initial encounter [T85.44XA]",
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
我有几个这样的向量:
Infection<-c("L76","L00", "L01","L02","L03","L04", "L05","L08")
Hematoma<-c("N64.89","M79.81")
Seroma<- c("L76.34")
Necrosis<- c("N64.1","T86.821")
CapsularContracture<- c("T85.44")
MechanicalComplications<- c("T85", "T85.4", "T85.41", "T85.42", "T85.43", "T85.49")
我想做的是在数据框中创建新列,如果在每一行中找到该向量,则这些列是 TRUE
/FALSE
.(即使它在该行中多次出现,它也只是 TRUE,即它不需要计数"它们).
What I'd like to do is create new columns in the data frame that are TRUE
/FALSE
for if that vector was found in each row. (And it would just be TRUE even if it shows up multiple times in that row, i.e. it doesn't need to "count" them).
所以我想要的输出是这样的:
So the output I want would be something like this:
我挣扎并寻求帮助的原因是我真的不知道如何将搜索特定字符串(可能在该列中较长的句子中)和查看多个列结合起来.
The reason I am struggling and came to stack for help is I don't really know how to combine searching for particular strings (that might be within a longer sentence in that column) and looking over multiple columns.
可能很重要的其他信息:
- 除了这 3 个结果诊断之外,还有更多的列,如果答案查看整行而不考虑多少列,那将会很有用
- 有时这些代码不够具体,查找Seroma"等实际单词可能会很有用.我想这只是交换引号内的字符的一种情况,对吗?
推荐答案
您可以将向量存储在列表中:
You could store your vectors in a list:
lst <- list(Infection = c("L76","L00", "L01","L02","L03","L04", "L05","L08"),
Hematoma = c("N64.89","M79.81"),
Seroma = c("L76.34"),
Necrosis = c("N64.1","T86.821"),
CapsularContracture = c("T85.44"),
MechanicalComplications = c("T85", "T85.4", "T85.41", "T85.42", "T85.43", "T85.49"))
然后,使用 dplyr
和 purrr
你可以这样做:
And then, using dplyr
and purrr
you could do:
imap(lst,
~ df %>%
mutate(!!.y := reduce(across(Outcome_Diagnosis_1:Outcome_Diagnosis_3, function(y) grepl(paste(sub("\\.", "", .x), collapse = "|"), sub("\\.", "", y))), `|`))) %>%
reduce(full_join)
MRN Outcome_Diagnos… Outcome_Diagnos… Outcome_Diagnos… Infection Hematoma Seroma Necrosis CapsularContrac…
<chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 53634 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
2 65708 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
3 72122 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
4 40458 Seroma of breas… Extrusion of br… <NA> FALSE TRUE FALSE FALSE FALSE
5 03935 Breast implant … <NA> <NA> FALSE FALSE FALSE FALSE TRUE
6 67473 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
7 20281 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
8 52479 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
9 10261 Acquired breast… <NA> <NA> FALSE TRUE FALSE FALSE FALSE
10 40945 <NA> <NA> <NA> FALSE FALSE FALSE FALSE FALSE
这篇关于R,从向量中查找字符串,创建新的 TRUE/FALSE 列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!