查找多个/重叠的匹配子字符串的索引 [英] Finding the indexes of multiple/overlapping matching substrings
问题描述
我有一个字符串 s =CCCGTGCC
和一个字符串 ss =CC
。我想获得 s
中的所有索引,它们开始字符串 ss
。在我的例子中,我想回到数组 c(1,2,6)
。
I have a string, s="CCCGTGCC"
and a subtstring ss="CC"
. I want to get all the indexes in s
that start the string ss
. In my example I would want to get back the array c(1,2,6)
.
任何字符串函数实现这个?注意我的字符串是CCCGTGCC
,而不是 c(C,C,C,G ,T,G,C,C)
。
Is there any string function that achieves this? Notice that my string is in the form "CCCGTGCC"
, and not c("C","C","C","G","T","G","C","C")
.
grep
只返回在字符串中是否有匹配的任何地方,而不是字符串中的匹配的索引,除非我错过了一些东西。
grep
only returns whether there is a match anywhere in the string, and not the indexes of the matches within the string, unless I'm missing something.
推荐答案
使用尝试
并使用带有前瞻性断言的perl正则表达式(请参阅 gregexpr
?regex
):
gregexpr("(?=CC)","CCCGTGCC",perl=TRUE)
[[1]]
[1] 1 2 7
attr(,"match.length")
[1] 0 0 0
这篇关于查找多个/重叠的匹配子字符串的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!