选择多个列并根据特定和/或条件进行过滤,然后形成新的列 [英] Select multiple columns and filter on specific and/or conditions then form new Column
问题描述
我正在使用一个很大的病历数据库.基本上,我试图根据某些诊断代码对患者进行分类.每个患者在任何情况下(DX1,DX2,DX3,DX4,DX5,DX6,DX7,DX8,DX9,DX10).我在使用多个条件进行编码以尽可能地通过每个DX列进行过滤时遇到困难有不同的安排.
I am using a very large database of patient records. Basically I am trying to categorize patient based on certain diagnosis codes. Each patient may have btw 1-10 diagnosis codes in any arrangement (DX1, DX2, DX3, DX4, DX5, DX6, DX7, DX8, DX9,DX10). I am having difficulty coding with multiple conditions to filter through each DX column as they can be in different arrangements.
Example Dataset:
DX1<-c("05", "89", "99", "87", "05", "67")
DX2<-c("87", "05", "21", "26", "23", "44","89", "13", "2" )
DX3<-c("04", "99","23", "05", "57", "16", "90", "89", "87")
DX4<-c("05", "26","21")
DX5<-c("67", "86","44", "63", "18", "87", "87", "05")
DX6<-c("75", "06","24", "02", "86", "15", "01", "87")
DX7<-c("86", "87","66", "56", "65", "05", "72")
install.packages("qpcR")
library("qpcR")
Patientdata <- qpcR:::cbind.na(DX1, DX2, DX3, DX4, DX5, DX6, DX7)
data.frame(Patientdata)
DX1 DX2 DX3 DX4 DX5 DX6 DX7
1 05 87 04 05 67 75 86
2 89 05 99 26 86 06 87
3 99 21 23 21 44 24 66
4 87 26 05 NA 63 02 56
5 05 23 57 NA 18 86 65
6 67 44 44 NA 87 15 05
7 NA 89 90 NA 87 01 72
8 NA 13 89 NA 05 87 NA
9 NA 2 87 NA NA NA NA
我想筛选出所有DX(05或5或02或2或62)AND(87或087或0086或089或89)
I would like to filter all patients who have a DX with (05, or 5 or 02 or 2 or 62) AND (87 or 087 or 0086 or 089 or 89)
Patientdata<- Patientdata%>% mutate_at(vars(DX1, DX2, DX3, DX4, DX5, DX6, DX7),
Diagnosis= ifelse(. %in% c("05"| "5"| "02"| "2"| "36"| "62"|"0062") &
c("87"| "087"| "86"| "0086"| "89"| "089"), "Yes"))
我想要什么:
ID | DX1 | DX2 | DX3 | DX4 | DX5 | DX6 | DX7 | 诊断 |
---|---|---|---|---|---|---|---|---|
1 | 05 | 87 | 04 | 05 | 67 | 75 | 86 | 是 |
2 | 89 | 05 | 99 | 26 | 86 | 06 | 87 | 是 |
3 | 99 | 21 | 23 | 21 | 44 | 24 | 66 | |
4 | 87 | 26 | 05 | NA | 63 | 02 | 56 | 是 |
5 | 05 | 23 | 57 | NA | 18 | 86 | 65 | |
6 | 67 | 44 | 46 | NA | 87 | 15 | 05 | 是 |
7 | NA | 89 | 90 | NA | 87 | 01 | 72 | |
8 | NA | 13 | 89 | NA | 05 | 87 | NA | 是 |
9 | NA | 2 | 87 | NA | NA | NA | NA | 是 |
非常感谢您的帮助!
推荐答案
这里是 tidyverse
方法,未添加诊断变量:
Here is a tidyverse
method without adding a Diagnosis variable:
rowAny <- function(x) rowSums(x) > 0
Patientdata %>%
mutate(ID = row_number()) %>%
filter(
rowAny(
across(
starts_with("DX"),
~ .x %in% c("05", "5", "02", "2", "36", "62","0062"),
.x %in% c("87", "087", "86", "0086", "89", "089"))))
示例中有ID,但示例数据中没有ID,这就是在上面添加ID的原因.
You have ID in your example but not in your sample data, which is why it's added above.
这给了我们
DX1 DX2 DX3 DX4 DX5 DX6 DX7 ID
1 05 87 04 05 67 75 86 1
2 89 05 99 26 86 06 87 2
3 87 26 05 <NA> 63 02 56 4
4 05 23 57 <NA> 18 86 65 5
5 67 44 16 <NA> 87 15 05 6
6 <NA> 13 89 <NA> 05 87 <NA> 8
7 <NA> 2 87 <NA> <NA> <NA> <NA> 9
这篇关于选择多个列并根据特定和/或条件进行过滤,然后形成新的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!