R:简化长ifelse语句 [英] R: Simplifying long ifelse statement

查看:79
本文介绍了R:简化长ifelse语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据医学数据集中具有2500多个值的过程代码变量创建新变量,以提取抗生素,其剂量和途径.我已经能够使用ifelse语句执行此操作,但是这很耗时,并且很难发现和纠正错误.有简化的方法吗?不幸的是,这些代码没有以任何逻辑方式进行组织.

I'm trying to create new variables based on a procedure code variable with 2500+ values in a medical dataset to pull out the antibiotics, their dose, and route. I've been able to do this with ifelse statements, but it was time consuming, and is difficult to find and correct mistakes. Is there a simplified way to do this? Unfortunately the codes aren't organized in any logical way.

vet <-mutate(vet, ab = ifelse(ProcedureCode=="6160"|ProcedureCode=="2028"|ProcedureCode=="6121"|ProcedureCode=="6130"|ProcedureCode=="6131"|ProcedureCode=="6132"|ProcedureCode=="6133" |ProcedureCode=="6134"|ProcedureCode=="6135"|ProcedureCode=="6136"|ProcedureCode=="6090" |ProcedureCode=="6137"|ProcedureCode=="6138"|ProcedureCode=="6139" |ProcedureCode=="6140" |ProcedureCode=="6510"|ProcedureCode=="680D" |ProcedureCode=="633E"|ProcedureCode=="661J"|ProcedureCode=="627I" |ProcedureCode=="6198"|ProcedureCode=="6199"|ProcedureCode=="6200" |ProcedureCode=="6201" |ProcedureCode=="6202"|ProcedureCode=="622G" |ProcedureCode=="697C" |ProcedureCode=="698C" |ProcedureCode=="6204"|ProcedureCode=="6775"| ProcedureCode=="6229" |ProcedureCode=="6207" |ProcedureCode=="6203" |ProcedureCode=="6205" |ProcedureCode=="6206" |ProcedureCode=="6212" |ProcedureCode=="6213" |ProcedureCode=="6214" |ProcedureCode=="6215" |ProcedureCode=="6216" |ProcedureCode=="6219" |ProcedureCode=="692C" |ProcedureCode=="643C" |ProcedureCode=="601E" |ProcedureCode=="629G" |ProcedureCode=="6234" |ProcedureCode=="6235" |ProcedureCode=="6236" |ProcedureCode=="6237" |ProcedureCode=="6238" |ProcedureCode=="615J" |ProcedureCode=="6242" |ProcedureCode=="6243" |ProcedureCode=="6244" |ProcedureCode=="6245" |ProcedureCode=="1193" |ProcedureCode=="652G" |ProcedureCode=="657G" |ProcedureCode=="697B"|ProcedureCode=="6336" |ProcedureCode=="6337" |ProcedureCode=="6338" |ProcedureCode=="6152" |ProcedureCode=="603C" |ProcedureCode=="655B" |ProcedureCode=="6357" |ProcedureCode=="6358" |ProcedureCode=="6399" |ProcedureCode=="666B" |ProcedureCode=="695D" |ProcedureCode=="699C" |ProcedureCode=="6365" |ProcedureCode=="6366" |ProcedureCode=="696F" |ProcedureCode=="6497" |ProcedureCode=="6613" |ProcedureCode=="6508" |ProcedureCode=="6509" |ProcedureCode=="617I" |ProcedureCode=="6506" |ProcedureCode=="2029" |ProcedureCode=="6538" |ProcedureCode=="671J" |ProcedureCode=="633H" |ProcedureCode=="621G" |ProcedureCode=="680J" |ProcedureCode=="672G" |ProcedureCode=="673G" |ProcedureCode=="6559" |ProcedureCode=="6652" |ProcedureCode=="6593" |ProcedureCode=="651C" |ProcedureCode=="633B" |ProcedureCode=="659E" |ProcedureCode=="676D" |ProcedureCode=="678D" |ProcedureCode=="620B" |ProcedureCode=="6562" |ProcedureCode=="6564" |ProcedureCode=="6585" |ProcedureCode=="6766" |ProcedureCode=="6595" |ProcedureCode=="6607" |ProcedureCode=="6608" |ProcedureCode=="627B" |ProcedureCode=="6653" |ProcedureCode=="6654" |ProcedureCode=="6655"|ProcedureCode=="6732" |ProcedureCode=="6733" |ProcedureCode=="6734"|ProcedureCode=="6735" |ProcedureCode=="6795"|ProcedureCode=="6745" |ProcedureCode=="6746" |ProcedureCode=="6748" |ProcedureCode=="6758" |ProcedureCode=="697E" |ProcedureCode=="6761" |ProcedureCode=="6032" |ProcedureCode=="6747" |ProcedureCode=="6749" |ProcedureCode=="668A" |ProcedureCode=="648A" |ProcedureCode=="649A" |ProcedureCode=="6765" |ProcedureCode=="6768" |ProcedureCode=="6771" |ProcedureCode=="637B"|ProcedureCode=="6894", 1,0))

问题还在于我需要创建多个组(例如:抗生素[是/否],剂量,途径),而且我觉得我缺少一种更好的方法,它不涉及剪切和粘贴变量每次都用引号引起来.是否有可能制作数据帧并使用ifelse将该数据帧中也存在的代码分配为1,将其他代码分配为0?

The problem is also that I need to create multiple groups (ex: Antibiotic [yes/no], dose, route) and I feel like there's a better way I'm missing that doesn't involve cutting and pasting the variable and quotation marks each time. Is there potentially a way to make a data frame and use ifelse to assign any codes that are also in that dataframe as a 1 and others as a 0?

很抱歉,如果重复的话,我对R还是比较陌生,并且在查找词汇表以搜索所需内容时遇到了麻烦.我环顾四周(例如嵌套的ifelse语句,但还没有找到我所需要的.

Sorry if this is duplicated, I'm relatively new to R and am having trouble finding the vocabulary to search for what I need. I have looked around (like Nested ifelse statement , but haven't found quite what I need.

推荐答案

两种替代方法,都使用合并/联接.这种方法的优点之一是易于维护:您拥有结构合理且易于管理的过程表,而不必使用ifelse语句来编写(可能是很长的)代码行.建议使用%in%的注释也可以减少此问题,尽管您将处理可管理的向量而不是可管理的帧.

Two alternative methods, both using merges/joins. One advantage of this approach is that it is much easier to maintain: you have well-structured and manageable tables of procedures instead of (potentially really-long) lines of code with your ifelse statement. The comments suggesting %in% also reduce this problem, though you'll deal with manageable vectors instead of mangeable frames.

假数据:

library(dplyr)
library(tidyr)
vet <- data_frame(ProcedureCode = c('6160', '2028', '2029'))

  1. 每种过程类型一帧.这是可以管理的,但是如果您有很多不同的类型,可能会很烦人.对每种类型重复left_join.

abs <- data_frame(ab=TRUE, ProcedureCode = c('6160', '2028'))
antis <- data_frame(antibiotic=TRUE, ProcedureCode = c('2029'))
vet %>%
  left_join(abs, by = "ProcedureCode") %>%
  left_join(antis, by = "ProcedureCode") %>%
  mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
#   ProcedureCode    ab antibiotic
#           <chr> <lgl>      <lgl>
# 1          6160  TRUE      FALSE
# 2          2028  TRUE      FALSE
# 3          2029 FALSE       TRUE

使用ab=TRUE(等)是为了合并一列.不匹配的行将带有NA,这强制需要!is.na(.)T,NA,T转换为T,F,T.

The use of ab=TRUE (etc) is so that there is a column to merge. The rows that do not match will have an NA, which mandates the need for !is.na(.) to convert T,NA,T to T,F,T.

您甚至可以改用过程代码矢量,例如:

You could even use vectors of procedure codes instead, something like:

vet %>%
  left_join(data_frame(ab=TRUE, ProcedureCode=vector_of_abs), by = "ProcedureCode") %>%
  ...

尽管这确实只有在您已经将代码作为矢量的情况下才有用,否则似乎只不过是您更易于维护的那个.

Though that really only helps if you already have the codes as vectors, otherwise it seems to be solely whichever is easier for you to maintain.

一帧包含所有过程,类型仅需要一帧,而left_join则需要一个

One frame with all procedures, requiring only a single frame for types and a single left_join.

procedures <- tibble::tribble(
  ~ProcedureCode, ~procedure,
  '6160'        , 'ab',
  '2028'        , 'ab',
  '2029'        , 'antibiotic'
)
left_join(vet, procedures, by = "ProcedureCode")
# # A tibble: 3 × 2
#   ProcedureCode  procedure
#           <chr>      <chr>
# 1          6160         ab
# 2          2028         ab
# 3          2029 antibiotic

您可以按原样保留它(如果可以这种方式存储它),也可以像其他样式一样保留它:

You can either keep it as-is (if it makes sense to store it that way) or spread it to be like the others:

left_join(vet, procedures, by = "ProcedureCode") %>%
  mutate(ignore=TRUE) %>%
  spread(procedure, ignore) %>%
  mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
#   ProcedureCode    ab antibiotic
#           <chr> <lgl>      <lgl>
# 1          2028  TRUE      FALSE
# 2          2029 FALSE       TRUE
# 3          6160  TRUE      FALSE

(连接/合并后的顺序在此有所不同,但数据保持不变.)

(Order after the join/merge is different here, but the data remains the same.)

(我用过logical s,将它们转换为1和0可能很容易,也许是mutate(ab=1L*ab)mutate(ab=as.integer(ab)).)

(I used logicals, it's easy enough to convert them to 1s and 0s, perhaps mutate(ab=1L*ab) or mutate(ab=as.integer(ab)).)

这篇关于R:简化长ifelse语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆