如何在R中指定要在匹配中使用的列(不单独列出)? [英] How can I specify columns in R to be used in matches (without listing each individually)?

查看:89
本文介绍了如何在R中指定要在匹配中使用的列(不单独列出)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有三列数据( sample1 sample2 sample3 )。我想要所有的行,其中字母 b h 出现在任何一列。这项工作正常:

  data<  -  data.frame(row_name = c(s1_100,s1_200,s2_300 ,s1_400,s1_500),
sample1 = rep(a,5),
sample2 = c )),
sample3 = c(rep(a,4),h)


数据

#row_name sample1 sample2 sample3
#s1_100 aba
#s1_200 aba
#s1_300 aaa
#s1_400 aaa
#s1_500 aah

bh < - c 'b','h')
bh_data< - subset(data,(sample1%in%bh | sample2%in%bh | sample3%in%bh))

bh_data

#row_name sample1 sample2 sample3
#s1_100 aba
#s1_200 aba
#s1_500 aah

然而,由于我对每一列提出相同的问题,是不是有更少的冗余方法来做到这一点?



但实际上,我们有超过800列和超过70,000行,我们希望能够选择多个或少数特定列进行搜索。

解决方案

尝试

p>

  indx<  -  Reduce(`|`,lapply(df [, -  1],`%in%`,bh) 
df [indx,]
#row_name sample1 sample2 sample3
#1 s1_100 aba
#2 s1_200 aba
#5 s1_500 aah

或使用 data.table


$ b b

  library(data.table)
nm1 < - paste0(sample,1:3)
setDT(df) (`|`,lapply(.SD,`%in%`,bh)),.SDcols = nm1]]
#row_name sample1 sample2 sample3
#1:s1_100 aba
#2 :s1_200 aba
#3:s1_500 aah



data



<$ c $ p> df < - structure(list(row_name = c(s1_100,s1_200,s1_300,s1_400,
s1_500 ),sample1 = c(a,a,a,a,a),sample2 = c(b,
b,a,a ,a),sample3 = c(a,a,a,a,h)).Names = c(row_name,
sample1 sample2,sample3),class =data.frame,row.names = c(NA,
-5L))


Suppose I have three columns of data (sample1, sample2, and sample3). I want all of the rows in which the letter b or h appears in any one of the columns. This works fine:

data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"), 
                   sample1=rep("a",5),
                   sample2=c(rep("b",2),rep("a",3)),
                   sample3=c(rep("a",4),"h")
)

data

# row_name  sample1   sample2   sample3
# s1_100    a         b         a
# s1_200    a         b         a
# s1_300    a         a         a
# s1_400    a         a         a
# s1_500    a         a         h

bh <- c('b','h')
bh_data <- subset(data, ( sample1 %in% bh | sample2 %in% bh | sample3 %in% bh )  )

bh_data

# row_name  sample1   sample2   sample3
# s1_100    a         b         a
# s1_200    a         b         a
# s1_500    a         a         h

However, since I'm asking the same question about each column, isn't there a less redundant way to do this?

But in reality, we have over 800 columns and over 70,000 rows, and we will want to be able to choose as many or as few specific columns to search. Using hundreds of column names for example, just doesn't seem practical unless I script creating the R script.

解决方案

Try

 indx <- Reduce(`|`, lapply(df[,-1], `%in%`, bh))
 df[indx,]
 #   row_name sample1 sample2 sample3
 #1   s1_100       a       b       a
 #2   s1_200       a       b       a
 #5   s1_500       a       a       h

Or using data.table

 library(data.table)
 nm1 <- paste0("sample", 1:3)
 setDT(df)[df[, Reduce(`|`,lapply(.SD, `%in%`, bh)), .SDcols=nm1]]
 #    row_name sample1 sample2 sample3
 #1:   s1_100       a       b       a
 #2:   s1_200       a       b       a
 #3:   s1_500       a       a       h

data

 df <- structure(list(row_name = c("s1_100", "s1_200", "s1_300", "s1_400", 
 "s1_500"), sample1 = c("a", "a", "a", "a", "a"), sample2 = c("b", 
 "b", "a", "a", "a"), sample3 = c("a", "a", "a", "a", "h")), .Names = c("row_name", 
 "sample1", "sample2", "sample3"), class = "data.frame", row.names = c(NA, 
 -5L))

这篇关于如何在R中指定要在匹配中使用的列(不单独列出)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆