R:简化长期的ifelse声明 [英] R: Simplifying long ifelse statment
问题描述
我正在尝试根据医疗数据集中具有2500+值的程序代码变量创建新变量,以提取抗生素,其剂量和途径。我已经能够用ifelse语句做到这一点,但这很费时间,很难找到并纠正错误。有一种简化的方法吗?不幸的是,代码没有以任何逻辑方式组织。
I'm trying to create new variables based on a procedure code variable with 2500+ values in a medical dataset to pull out the antibiotics, their dose, and route. I've been able to do this with ifelse statments, but it was time consuming, and is difficult to find and correct mistakes. Is there a simplified way to do this? Unfortunately the codes aren't organized in any logical way.
vet <-mutate(vet, ab = ifelse(ProcedureCode=="6160"|ProcedureCode=="2028"|ProcedureCode=="6121"|ProcedureCode=="6130"|ProcedureCode=="6131"|ProcedureCode=="6132"|ProcedureCode=="6133" |ProcedureCode=="6134"|ProcedureCode=="6135"|ProcedureCode=="6136"|ProcedureCode=="6090" |ProcedureCode=="6137"|ProcedureCode=="6138"|ProcedureCode=="6139" |ProcedureCode=="6140" |ProcedureCode=="6510"|ProcedureCode=="680D" |ProcedureCode=="633E"|ProcedureCode=="661J"|ProcedureCode=="627I" |ProcedureCode=="6198"|ProcedureCode=="6199"|ProcedureCode=="6200" |ProcedureCode=="6201" |ProcedureCode=="6202"|ProcedureCode=="622G" |ProcedureCode=="697C" |ProcedureCode=="698C" |ProcedureCode=="6204"|ProcedureCode=="6775"| ProcedureCode=="6229" |ProcedureCode=="6207" |ProcedureCode=="6203" |ProcedureCode=="6205" |ProcedureCode=="6206" |ProcedureCode=="6212" |ProcedureCode=="6213" |ProcedureCode=="6214" |ProcedureCode=="6215" |ProcedureCode=="6216" |ProcedureCode=="6219" |ProcedureCode=="692C" |ProcedureCode=="643C" |ProcedureCode=="601E" |ProcedureCode=="629G" |ProcedureCode=="6234" |ProcedureCode=="6235" |ProcedureCode=="6236" |ProcedureCode=="6237" |ProcedureCode=="6238" |ProcedureCode=="615J" |ProcedureCode=="6242" |ProcedureCode=="6243" |ProcedureCode=="6244" |ProcedureCode=="6245" |ProcedureCode=="1193" |ProcedureCode=="652G" |ProcedureCode=="657G" |ProcedureCode=="697B"|ProcedureCode=="6336" |ProcedureCode=="6337" |ProcedureCode=="6338" |ProcedureCode=="6152" |ProcedureCode=="603C" |ProcedureCode=="655B" |ProcedureCode=="6357" |ProcedureCode=="6358" |ProcedureCode=="6399" |ProcedureCode=="666B" |ProcedureCode=="695D" |ProcedureCode=="699C" |ProcedureCode=="6365" |ProcedureCode=="6366" |ProcedureCode=="696F" |ProcedureCode=="6497" |ProcedureCode=="6613" |ProcedureCode=="6508" |ProcedureCode=="6509" |ProcedureCode=="617I" |ProcedureCode=="6506" |ProcedureCode=="2029" |ProcedureCode=="6538" |ProcedureCode=="671J" |ProcedureCode=="633H" |ProcedureCode=="621G" |ProcedureCode=="680J" |ProcedureCode=="672G" |ProcedureCode=="673G" |ProcedureCode=="6559" |ProcedureCode=="6652" |ProcedureCode=="6593" |ProcedureCode=="651C" |ProcedureCode=="633B" |ProcedureCode=="659E" |ProcedureCode=="676D" |ProcedureCode=="678D" |ProcedureCode=="620B" |ProcedureCode=="6562" |ProcedureCode=="6564" |ProcedureCode=="6585" |ProcedureCode=="6766" |ProcedureCode=="6595" |ProcedureCode=="6607" |ProcedureCode=="6608" |ProcedureCode=="627B" |ProcedureCode=="6653" |ProcedureCode=="6654" |ProcedureCode=="6655"|ProcedureCode=="6732" |ProcedureCode=="6733" |ProcedureCode=="6734"|ProcedureCode=="6735" |ProcedureCode=="6795"|ProcedureCode=="6745" |ProcedureCode=="6746" |ProcedureCode=="6748" |ProcedureCode=="6758" |ProcedureCode=="697E" |ProcedureCode=="6761" |ProcedureCode=="6032" |ProcedureCode=="6747" |ProcedureCode=="6749" |ProcedureCode=="668A" |ProcedureCode=="648A" |ProcedureCode=="649A" |ProcedureCode=="6765" |ProcedureCode=="6768" |ProcedureCode=="6771" |ProcedureCode=="637B"|ProcedureCode=="6894", 1,0))
问题还在于我需要创建多个组(例如:抗生素[是/否],剂量,途径),我觉得有一种更好的方法,我错过了每次不涉及剪切和粘贴变量和引号。是否有可能建立数据框并使用ifelse将任何也在该数据帧中的代码分配为1,将其他代码分配为0?
The problem is also that I need to create multiple groups (ex: Antibiotic [yes/no], dose, route) and I feel like there's a better way I'm missing that doesn't involve cutting and pasting the variable and quotation marks each time. Is there potentially a way to make a data frame and use ifelse to assign any codes that are also in that dataframe as a 1 and others as a 0?
很抱歉,如果这是重复的,我对R来说比较新,我很难找到词汇来搜索我需要的东西。我环顾四周(比如嵌套的ifelse声明,但还没找到我需要的东西。谢谢!
Sorry if this is duplicated, I'm relatively new to R and am having trouble finding the vocabulary to search for what I need. I have looked around (like Nested ifelse statement , but haven't found quite what I need. Thanks!
推荐答案
两种替代方法,都使用合并/连接。这种方法的一个优点是它更容易维护:你具有结构良好且易于管理的程序表,而不是(可能非常长的)代码行与您的 ifelse
语句。评论表明%in%
也可以减少这个问题,虽然你会处理可管理的向量而不是可管理的帧。
Two alternative methods, both using merges/joins. One advantage of this approach is that it is much easier to maintain: you have well-structured and manageable tables of procedures instead of (potentially really-long) lines of code with your ifelse
statement. The comments suggesting %in%
also reduce this problem, though you'll deal with manageable vectors instead of mangeable frames.
假数据:
library(dplyr)
library(tidyr)
vet <- data_frame(ProcedureCode = c('6160', '2028', '2029'))
-
每个程序类型一帧这是可以管理的,但是如果你有很多不同的类型可能会很烦人。重复
le每种类型的ft_join
。
abs <- data_frame(ab=TRUE, ProcedureCode = c('6160', '2028'))
antis <- data_frame(antibiotic=TRUE, ProcedureCode = c('2029'))
vet %>%
left_join(abs, by = "ProcedureCode") %>%
left_join(antis, by = "ProcedureCode") %>%
mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
# ProcedureCode ab antibiotic
# <chr> <lgl> <lgl>
# 1 6160 TRUE FALSE
# 2 2028 TRUE FALSE
# 3 2029 FALSE TRUE
使用 ab = TRUE
(等)是为了合并一列。不匹配的行将具有 NA
,这要求!is.na(。)
到将 T,NA,T
转换为 T,F,T
。
The use of ab=TRUE
(etc) is so that there is a column to merge. The rows that do not match will have an NA
, which mandates the need for !is.na(.)
to convert T,NA,T
to T,F,T
.
您甚至可以使用过程代码的向量,例如:
You could even use vectors of procedure codes instead, something like:
vet %>%
left_join(data_frame(ab=TRUE, ProcedureCode=vector_of_abs), by = "ProcedureCode") %>%
...
虽然这只有在你已经将代码作为向量的情况下才有用,否则它似乎只是为了让你更容易维护。
Though that really only helps if you already have the codes as vectors, otherwise it seems to be solely whichever is easier for you to maintain.
包含所有程序的一帧,只需要一个类型的框架和一个 left_join
。
One frame with all procedures, requiring only a single frame for types and a single left_join
.
procedures <- tibble::tribble(
~ProcedureCode, ~procedure,
'6160' , 'ab',
'2028' , 'ab',
'2029' , 'antibiotic'
)
left_join(vet, procedures, by = "ProcedureCode")
# # A tibble: 3 × 2
# ProcedureCode procedure
# <chr> <chr>
# 1 6160 ab
# 2 2028 ab
# 3 2029 antibiotic
你可以保持原样(如果以这种方式存储它是有意义的)或传播
它就像其他人一样:
You can either keep it as-is (if it makes sense to store it that way) or spread
it to be like the others:
left_join(vet, procedures, by = "ProcedureCode") %>%
mutate(ignore=TRUE) %>%
spread(procedure, ignore) %>%
mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
# ProcedureCode ab antibiotic
# <chr> <lgl> <lgl>
# 1 2028 TRUE FALSE
# 2 2029 FALSE TRUE
# 3 6160 TRUE FALSE
(此处加入/合并后的订单不同,但数据保持不变。)
(Order after the join/merge is different here, but the data remains the same.)
(我使用逻辑
s,很容易将它们转换为1和0,也许 mutate(ab = 1L * ab)
或 mutate(ab = as.integer(ab))
。)
(I used logical
s, it's easy enough to convert them to 1s and 0s, perhaps mutate(ab=1L*ab)
or mutate(ab=as.integer(ab))
.)
这篇关于R:简化长期的ifelse声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!