grep多列一种模式 [英] grep one pattern over multiple columns
问题描述
我正在尝试找出一种方法,让我在 mutate()的多列上使用仅一个局部模式的
。我想要一个新列,如果一组列中的任何一个包含特定字符串,则该列将为TRUE或FALSE。 grepl()
I'm trying to figure out a way for me to use grepl()
of only one partial pattern over multiple columns with mutate()
. I want to have a new column that will be TRUE or FALSE if ANY of a set of columns contains a certain string.
df <- structure(list(ID = c("A1.1234567_10", "A1.1234567_20"),
var1 = c("NORMAL", "NORMAL"),
var2 = c("NORMAL", "NORMAL"),
var3 = c("NORMAL", "NORMAL"),
var4 = c("NORMAL", "NORMAL"),
var5 = c("NORMAL", "NORMAL"),
var6 = c("NORMAL", "NORMAL"),
var7 = c("NORMAL", "ABNORMAL"),
var8 = c("NORMAL", "NORMAL")),
.Names = c("ID", "var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8"),
class = "data.frame", row.names = c(NA, -2L))
ID var1 var2 var3 var4 var5 var6 var7 var8
1 A1.1234567_10 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL
2 A1.1234567_20 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL ABNORMAL NORMAL
我尝试过
df$abnormal %>% mutate( abnormal = ifelse(grepl("abnormal",df[,119:131]) , TRUE, FALSE)))
以及其他大约100件事。我希望最终格式为
and about 100 other things. I want the final format to be
ID var1 var2 var3 var4 var5 var6 var7 var8 abnormal
1 A1.1234567_10 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL FALSE
2 A1.1234567_20 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL ABNORMAL NORMAL TRUE
每当我尝试每次都出错时,就会出错
Whenever I try I get false every time
推荐答案
我可能会这样做:
temp = sapply(your_data[columns_you_want_to_check],
function(x) grepl("suspected", x, ingore.case = TRUE))
your_data$abnormal = rowSums(temp) > 0
由于您的问题,我只是使用了您的数据
在 df
和 test.file
之间切换。
I just used your_data
since your question switches between df
and test.file
.
如果您真的想使用 mutate
,您可以
If you really want to use mutate
, you could do
df %>%
mutate(abnormal = rowSums(
sapply(select(., starts_with("var")),
function(x) grepl("suspected", x, ingore.case = TRUE)
)) > 0
)
如果您需要更高的效率,如果可以依靠大小写一致,则可以使用 fixed = TRUE
代替 ignore.case = TRUE
。 (也许首先将所有转换为__lower()
。)
If you need more efficiency, you can use fixed = TRUE
instead of ignore.case = TRUE
if you can count on the case being consistent. (Maybe convert everything to_lower()
first.)
放弃> 0
获取每一行的计数。
Leave off the > 0
to get the count for each row.
这篇关于grep多列一种模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!