R:简化长期的ifelse声明 [英] R: Simplifying long ifelse statment

查看:98
本文介绍了R:简化长期的ifelse声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据医疗数据集中具有2500+值的程序代码变量创建新变量,以提取抗生素,其剂量和途径。我已经能够用ifelse语句做到这一点,但这很费时间,很难找到并纠正错误。有一种简化的方法吗?不幸的是,代码没有以任何逻辑方式组织。

I'm trying to create new variables based on a procedure code variable with 2500+ values in a medical dataset to pull out the antibiotics, their dose, and route. I've been able to do this with ifelse statments, but it was time consuming, and is difficult to find and correct mistakes. Is there a simplified way to do this? Unfortunately the codes aren't organized in any logical way.

vet <-mutate(vet, ab = ifelse(ProcedureCode=="6160"|ProcedureCode=="2028"|ProcedureCode=="6121"|ProcedureCode=="6130"|ProcedureCode=="6131"|ProcedureCode=="6132"|ProcedureCode=="6133" |ProcedureCode=="6134"|ProcedureCode=="6135"|ProcedureCode=="6136"|ProcedureCode=="6090" |ProcedureCode=="6137"|ProcedureCode=="6138"|ProcedureCode=="6139" |ProcedureCode=="6140" |ProcedureCode=="6510"|ProcedureCode=="680D" |ProcedureCode=="633E"|ProcedureCode=="661J"|ProcedureCode=="627I" |ProcedureCode=="6198"|ProcedureCode=="6199"|ProcedureCode=="6200" |ProcedureCode=="6201" |ProcedureCode=="6202"|ProcedureCode=="622G" |ProcedureCode=="697C" |ProcedureCode=="698C" |ProcedureCode=="6204"|ProcedureCode=="6775"| ProcedureCode=="6229" |ProcedureCode=="6207" |ProcedureCode=="6203" |ProcedureCode=="6205" |ProcedureCode=="6206" |ProcedureCode=="6212" |ProcedureCode=="6213" |ProcedureCode=="6214" |ProcedureCode=="6215" |ProcedureCode=="6216" |ProcedureCode=="6219" |ProcedureCode=="692C" |ProcedureCode=="643C" |ProcedureCode=="601E" |ProcedureCode=="629G" |ProcedureCode=="6234" |ProcedureCode=="6235" |ProcedureCode=="6236" |ProcedureCode=="6237" |ProcedureCode=="6238" |ProcedureCode=="615J" |ProcedureCode=="6242" |ProcedureCode=="6243" |ProcedureCode=="6244" |ProcedureCode=="6245" |ProcedureCode=="1193" |ProcedureCode=="652G" |ProcedureCode=="657G" |ProcedureCode=="697B"|ProcedureCode=="6336" |ProcedureCode=="6337" |ProcedureCode=="6338" |ProcedureCode=="6152" |ProcedureCode=="603C" |ProcedureCode=="655B" |ProcedureCode=="6357" |ProcedureCode=="6358" |ProcedureCode=="6399" |ProcedureCode=="666B" |ProcedureCode=="695D" |ProcedureCode=="699C" |ProcedureCode=="6365" |ProcedureCode=="6366" |ProcedureCode=="696F" |ProcedureCode=="6497" |ProcedureCode=="6613" |ProcedureCode=="6508" |ProcedureCode=="6509" |ProcedureCode=="617I" |ProcedureCode=="6506" |ProcedureCode=="2029" |ProcedureCode=="6538" |ProcedureCode=="671J" |ProcedureCode=="633H" |ProcedureCode=="621G" |ProcedureCode=="680J" |ProcedureCode=="672G" |ProcedureCode=="673G" |ProcedureCode=="6559" |ProcedureCode=="6652" |ProcedureCode=="6593" |ProcedureCode=="651C" |ProcedureCode=="633B" |ProcedureCode=="659E" |ProcedureCode=="676D" |ProcedureCode=="678D" |ProcedureCode=="620B" |ProcedureCode=="6562" |ProcedureCode=="6564" |ProcedureCode=="6585" |ProcedureCode=="6766" |ProcedureCode=="6595" |ProcedureCode=="6607" |ProcedureCode=="6608" |ProcedureCode=="627B" |ProcedureCode=="6653" |ProcedureCode=="6654" |ProcedureCode=="6655"|ProcedureCode=="6732" |ProcedureCode=="6733" |ProcedureCode=="6734"|ProcedureCode=="6735" |ProcedureCode=="6795"|ProcedureCode=="6745" |ProcedureCode=="6746" |ProcedureCode=="6748" |ProcedureCode=="6758" |ProcedureCode=="697E" |ProcedureCode=="6761" |ProcedureCode=="6032" |ProcedureCode=="6747" |ProcedureCode=="6749" |ProcedureCode=="668A" |ProcedureCode=="648A" |ProcedureCode=="649A" |ProcedureCode=="6765" |ProcedureCode=="6768" |ProcedureCode=="6771" |ProcedureCode=="637B"|ProcedureCode=="6894", 1,0))

问题还在于我需要创建多个组(例如:抗生素[是/否],剂量,途径),我觉得有一种更好的方法,我错过了每次不涉及剪切和粘贴变量和引号。是否有可能建立数据框并使用ifelse将任何也在该数据帧中的代码分配为1,将其他代码分配为0?

The problem is also that I need to create multiple groups (ex: Antibiotic [yes/no], dose, route) and I feel like there's a better way I'm missing that doesn't involve cutting and pasting the variable and quotation marks each time. Is there potentially a way to make a data frame and use ifelse to assign any codes that are also in that dataframe as a 1 and others as a 0?

很抱歉,如果这是重复的,我对R来说比较新,我很难找到词汇来搜索我需要的东西。我环顾四周(比如嵌套的ifelse声明,但还没找到我需要的东西。谢谢!

Sorry if this is duplicated, I'm relatively new to R and am having trouble finding the vocabulary to search for what I need. I have looked around (like Nested ifelse statement , but haven't found quite what I need. Thanks!

推荐答案

两种替代方法,都使用合并/连接。这种方法的一个优点是它更容易维护:你具有结构良好且易于管理的程序表,而不是(可能非常长的)代码行与您的 ifelse 语句。评论表明%in% 也可以减少这个问题,虽然你会处理可管理的向量而不是可管理的帧。

Two alternative methods, both using merges/joins. One advantage of this approach is that it is much easier to maintain: you have well-structured and manageable tables of procedures instead of (potentially really-long) lines of code with your ifelse statement. The comments suggesting %in% also reduce this problem, though you'll deal with manageable vectors instead of mangeable frames.

假数据:

library(dplyr)
library(tidyr)
vet <- data_frame(ProcedureCode = c('6160', '2028', '2029'))




  1. 每个程序类型一帧这是可以管理的,但是如果你有很多不同的类型可能会很烦人。重复 le每种类型的ft_join

abs <- data_frame(ab=TRUE, ProcedureCode = c('6160', '2028'))
antis <- data_frame(antibiotic=TRUE, ProcedureCode = c('2029'))
vet %>%
  left_join(abs, by = "ProcedureCode") %>%
  left_join(antis, by = "ProcedureCode") %>%
  mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
#   ProcedureCode    ab antibiotic
#           <chr> <lgl>      <lgl>
# 1          6160  TRUE      FALSE
# 2          2028  TRUE      FALSE
# 3          2029 FALSE       TRUE

使用 ab = TRUE (等)是为了合并一列。不匹配的行将具有 NA ,这要求!is.na(。)到将 T,NA,T 转换为 T,F,T

The use of ab=TRUE (etc) is so that there is a column to merge. The rows that do not match will have an NA, which mandates the need for !is.na(.) to convert T,NA,T to T,F,T.

您甚至可以使用过程代码的向量,例如:

You could even use vectors of procedure codes instead, something like:

vet %>%
  left_join(data_frame(ab=TRUE, ProcedureCode=vector_of_abs), by = "ProcedureCode") %>%
  ...

虽然这只有在你已经将代码作为向量的情况下才有用,否则它似乎只是为了让你更容易维护。

Though that really only helps if you already have the codes as vectors, otherwise it seems to be solely whichever is easier for you to maintain.

包含所有程序的一帧,只需要一个类型的框架和一个 left_join

One frame with all procedures, requiring only a single frame for types and a single left_join.

procedures <- tibble::tribble(
  ~ProcedureCode, ~procedure,
  '6160'        , 'ab',
  '2028'        , 'ab',
  '2029'        , 'antibiotic'
)
left_join(vet, procedures, by = "ProcedureCode")
# # A tibble: 3 × 2
#   ProcedureCode  procedure
#           <chr>      <chr>
# 1          6160         ab
# 2          2028         ab
# 3          2029 antibiotic

你可以保持原样(如果以这种方式存储它是有意义的)或传播它就像其他人一样:

You can either keep it as-is (if it makes sense to store it that way) or spread it to be like the others:

left_join(vet, procedures, by = "ProcedureCode") %>%
  mutate(ignore=TRUE) %>%
  spread(procedure, ignore) %>%
  mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
#   ProcedureCode    ab antibiotic
#           <chr> <lgl>      <lgl>
# 1          2028  TRUE      FALSE
# 2          2029 FALSE       TRUE
# 3          6160  TRUE      FALSE

(此处加入/合并后的订单不同,但数据保持不变。)

(Order after the join/merge is different here, but the data remains the same.)

(我使用逻辑 s,很容易将它们转换为1和0,也许 mutate(ab = 1L * ab) mutate(ab = as.integer(ab))。)

(I used logicals, it's easy enough to convert them to 1s and 0s, perhaps mutate(ab=1L*ab) or mutate(ab=as.integer(ab)).)

这篇关于R:简化长期的ifelse声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆