选择多个列并根据特定和/或条件进行过滤,然后形成新的列 [英] Select multiple columns and filter on specific and/or conditions then form new Column

查看:56
本文介绍了选择多个列并根据特定和/或条件进行过滤,然后形成新的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个很大的病历数据库.基本上,我试图根据某些诊断代码对患者进行分类.每个患者在任何情况下(DX1,DX2,DX3,DX4,DX5,DX6,DX7,DX8,DX9,DX10).我在使用多个条件进行编码以尽可能地通过每个DX列进行过滤时遇到困难有不同的安排.

I am using a very large database of patient records. Basically I am trying to categorize patient based on certain diagnosis codes. Each patient may have btw 1-10 diagnosis codes in any arrangement (DX1, DX2, DX3, DX4, DX5, DX6, DX7, DX8, DX9,DX10). I am having difficulty coding with multiple conditions to filter through each DX column as they can be in different arrangements.

Example Dataset: 
DX1<-c("05", "89", "99", "87", "05", "67")
DX2<-c("87", "05", "21", "26", "23", "44","89", "13", "2" )
DX3<-c("04", "99","23", "05", "57", "16", "90", "89", "87")
DX4<-c("05", "26","21")
DX5<-c("67", "86","44", "63", "18", "87", "87", "05")
DX6<-c("75", "06","24", "02", "86", "15", "01", "87")
DX7<-c("86", "87","66", "56", "65", "05", "72")
install.packages("qpcR")                           
library("qpcR")
Patientdata <- qpcR:::cbind.na(DX1, DX2, DX3, DX4, DX5, DX6, DX7) 
data.frame(Patientdata)


   DX1 DX2 DX3  DX4  DX5  DX6  DX7
1   05  87  04   05   67   75   86
2   89  05  99   26   86   06   87
3   99  21  23   21   44   24   66
4   87  26  05 NA   63   02   56
5   05  23  57 NA   18   86   65
6   67  44  44 NA   87   15   05
7 NA  89  90 NA   87   01   72
8 NA  13  89 NA   05   87 NA
9 NA   2  87 NA NA NA NA

我想筛选出所有DX(05或5或02或2或62)AND(87或087或0086或089或89)

I would like to filter all patients who have a DX with (05, or 5 or 02 or 2 or 62) AND (87 or 087 or 0086 or 089 or 89)

Patientdata<- Patientdata%>% mutate_at(vars(DX1, DX2, DX3, DX4, DX5, DX6, DX7),
Diagnosis= ifelse(. %in% c("05"| "5"| "02"| "2"| "36"| "62"|"0062") &
c("87"| "087"| "86"| "0086"| "89"| "089"), "Yes"))

我想要什么:

<身体>
ID DX1 DX2 DX3 DX4 DX5 DX6 DX7 诊断
1 05 87 04 05 67 75 86
2 89 05 99 26 86 06 87
3 99 21 23 21 44 24 66
4 87 26 05 NA 63 02 56
5 05 23 57 NA 18 86 65
6 67 44 46 NA 87 15 05
7 NA 89 90 NA 87 01 72
8 NA 13 89 NA 05 87 NA
9 NA 2 87 NA NA NA NA

非常感谢您的帮助!

推荐答案

这里是 tidyverse 方法,未添加诊断变量:

Here is a tidyverse method without adding a Diagnosis variable:

rowAny <- function(x) rowSums(x) > 0

Patientdata %>% 
  mutate(ID = row_number()) %>% 
  filter(
    rowAny(
      across(
        starts_with("DX"), 
      ~ .x %in% c("05", "5", "02", "2", "36", "62","0062"), 
      .x %in% c("87", "087", "86", "0086", "89", "089")))) 

示例中有ID,但示例数据中没有ID,这就是在上面添加ID的原因.

You have ID in your example but not in your sample data, which is why it's added above.

这给了我们

   DX1 DX2 DX3  DX4  DX5  DX6  DX7 ID
1   05  87  04   05   67   75   86  1
2   89  05  99   26   86   06   87  2
3   87  26  05 <NA>   63   02   56  4
4   05  23  57 <NA>   18   86   65  5
5   67  44  16 <NA>   87   15   05  6
6 <NA>  13  89 <NA>   05   87 <NA>  8
7 <NA>   2  87 <NA> <NA> <NA> <NA>  9

这篇关于选择多个列并根据特定和/或条件进行过滤,然后形成新的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆