如何在R中指定要在匹配中使用的列(不单独列出)? [英] How can I specify columns in R to be used in matches (without listing each individually)?
问题描述
sample1
, sample2
和 sample3
)。我想要所有的行,其中字母 b
或 h
出现在任何一列。这项工作正常: data< - data.frame(row_name = c(s1_100,s1_200,s2_300 ,s1_400,s1_500),
sample1 = rep(a,5),
sample2 = c )),
sample3 = c(rep(a,4),h)
)
数据
#row_name sample1 sample2 sample3
#s1_100 aba
#s1_200 aba
#s1_300 aaa
#s1_400 aaa
#s1_500 aah
bh < - c 'b','h')
bh_data< - subset(data,(sample1%in%bh | sample2%in%bh | sample3%in%bh))
bh_data
#row_name sample1 sample2 sample3
#s1_100 aba
#s1_200 aba
#s1_500 aah
然而,由于我对每一列提出相同的问题,是不是有更少的冗余方法来做到这一点?
但实际上,我们有超过800列和超过70,000行,我们希望能够选择多个或少数特定列进行搜索。
尝试
p> indx< - Reduce(`|`,lapply(df [, - 1],`%in%`,bh)
df [indx,]
#row_name sample1 sample2 sample3
#1 s1_100 aba
#2 s1_200 aba
#5 s1_500 aah
或使用 data.table
$ b b
library(data.table)
nm1 < - paste0(sample,1:3)
setDT(df) (`|`,lapply(.SD,`%in%`,bh)),.SDcols = nm1]]
#row_name sample1 sample2 sample3
#1:s1_100 aba
#2 :s1_200 aba
#3:s1_500 aah
data
<$ c $ p>
df < - structure(list(row_name = c(s1_100,s1_200,s1_300,s1_400,
s1_500 ),sample1 = c(a,a,a,a,a),sample2 = c(b,
b,a,a ,a),sample3 = c(a,a,a,a,h)).Names = c(row_name,
sample1 sample2,sample3),class =data.frame,row.names = c(NA,
-5L))
Suppose I have three columns of data (sample1
, sample2
, and sample3
). I want all of the rows in which the letter b
or h
appears in any one of the columns. This works fine:
data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"),
sample1=rep("a",5),
sample2=c(rep("b",2),rep("a",3)),
sample3=c(rep("a",4),"h")
)
data
# row_name sample1 sample2 sample3
# s1_100 a b a
# s1_200 a b a
# s1_300 a a a
# s1_400 a a a
# s1_500 a a h
bh <- c('b','h')
bh_data <- subset(data, ( sample1 %in% bh | sample2 %in% bh | sample3 %in% bh ) )
bh_data
# row_name sample1 sample2 sample3
# s1_100 a b a
# s1_200 a b a
# s1_500 a a h
However, since I'm asking the same question about each column, isn't there a less redundant way to do this?
But in reality, we have over 800 columns and over 70,000 rows, and we will want to be able to choose as many or as few specific columns to search. Using hundreds of column names for example, just doesn't seem practical unless I script creating the R script.
Try
indx <- Reduce(`|`, lapply(df[,-1], `%in%`, bh))
df[indx,]
# row_name sample1 sample2 sample3
#1 s1_100 a b a
#2 s1_200 a b a
#5 s1_500 a a h
Or using data.table
library(data.table)
nm1 <- paste0("sample", 1:3)
setDT(df)[df[, Reduce(`|`,lapply(.SD, `%in%`, bh)), .SDcols=nm1]]
# row_name sample1 sample2 sample3
#1: s1_100 a b a
#2: s1_200 a b a
#3: s1_500 a a h
data
df <- structure(list(row_name = c("s1_100", "s1_200", "s1_300", "s1_400",
"s1_500"), sample1 = c("a", "a", "a", "a", "a"), sample2 = c("b",
"b", "a", "a", "a"), sample3 = c("a", "a", "a", "a", "h")), .Names = c("row_name",
"sample1", "sample2", "sample3"), class = "data.frame", row.names = c(NA,
-5L))
这篇关于如何在R中指定要在匹配中使用的列(不单独列出)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!