r grep by regex - 查找包含一个子字符串的字符串恰好一次 [英] r grep by regex - finding a string that contains a sub string exactly one once
问题描述
我在 Ubuntu 中使用 R,并试图查看文件列表,其中一些我需要,一些我不需要,
I am using R in Ubuntu, and trying to go over list of files, some of them i need and some of them i don't need,
我试图通过在其中找到一个需要出现一次的子字符串来获得我需要的那个,
I try to get the one's i need by finding a sub string in them, that need to appear exactly once,
我正在使用函数 grep,我在这里找到了 grep 函数在
i am using the function grep, that i found here grep function in r
并使用我在此处找到的正则表达式规则 正则表达式规则
and using the regex rules that i found here regex rules
以简单的例子为例
a <- c("a","aa")
grep("a{1}", a)
我希望只得到一次包含a"的字符串,而不是它,我得到了它们.
i would expect to get only the strings that contain "a" exactly one time, and instead of it i get both of them.
当我使用 2 而不是 1 时,我确实得到了一个字符串(包含aa"的字符串)的想要的结果
when i use the 2 instead of 1, i do get the wanted result of one strings (the one that contains "aa")
我不能使用 $ 因为这不是我需要的词的词尾,例如我需要使用这两个词germ-pass.tab",germ-pass_germ-pass.tab"并且只返回包含germ-pass"的第一个并且只返回一次
i can't use $ because this is not the end of the word for the words i need, for example i need to take those two words "germ-pass.tab", "germ-pass_germ-pass.tab" and return only the first that contains "germ-pass" once and once only
我不能使用^a,因为我不需要诸如aca"之类的词
i cant use ^a because i don't need words such as "aca"
谢谢.
推荐答案
我们可以使用 stringi::stri_count
:
library(stringi)
library(purrr)
# simulate some data
set.seed(1492)
(map_chr(1:10, function(i) {
paste0(sample(letters, sample(10:30), replace=TRUE), collapse="")
}) -> strings)
## [1] "jdpcypoizdzvfzs" "gyvcljnfmrzmdmkufq"
## [3] "xqwrmnklbixnccwyaiadrsxn" "bwbenawcwvdevmjfvs"
## [5] "ytzwnpkuromfbklfsdnbwwnlrw" "wclxpzftqgwxyetpsuslgohcdenuj"
## [7] "czkhanefss" "mxsrqrackxvimcxqcqsditrou"
## [9] "ysqshvzjjmwes" "yzawyoqxqxiasensorlenafcbk"
# How many "w"s in each string?
stri_count_regex(strings, "w{1}")
## [1] 0 0 2 3 4 2 0 0 1 1
这篇关于r grep by regex - 查找包含一个子字符串的字符串恰好一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!