R中一个字符串中的多个正则表达式 [英] Multiple regexpr in one string in R
问题描述
所以我有一个很长的字符串,我想处理多个匹配项.我似乎只能使用 regexpr
获得第一个匹配项的第一个位置.如何在同一个字符串中获得多个位置(更多匹配)?
So I have a really long string and I want to work with multiple matches. I can only seem to get the first position of the first match using regexpr
. How can I get multiple positions (more matches) back within the same string?
我在 html 源代码中寻找特定的字符串.拍卖的标题(在 html 标签之间).证明有点难找:
I am looking for a specific string in html source code. The titel of an auction (which is between html tags). It prooves kind of difficult to find:
到目前为止我使用这个:
So far I use this:
locationstart <- gregexpr("<span class=\"location-name\">", URL)[[1]]+28
locationend <- regexpr("<", substring(URL, locationstart[1], locationend[1] + 100))
substring(URL, locationstart[1], locationstart[1] + locationend - 2)
也就是说,我寻找标题之前的部分,然后我捕捉那个地方,从那里寻找<"表示标题结束.我愿意提供更具体的建议.
That is, I look for a part that comes before a title, then I capture that place, from there on look for a "<" indicating that the title ended. I'm open for more specific suggestions.
推荐答案
使用 gregexpr
允许多个匹配.
Using gregexpr
allows for multiple matches.
> x <- c("only one match", "match1 and match2", "none here")
> m <- gregexpr("match[0-9]*", x)
> m
[[1]]
[1] 10
attr(,"match.length")
[1] 5
attr(,"useBytes")
[1] TRUE
[[2]]
[1] 1 12
attr(,"match.length")
[1] 6 6
attr(,"useBytes")
[1] TRUE
[[3]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE
如果您想提取匹配项,您可以使用 regmatches
为您执行此操作.
and if you're looking to extract the match you can use regmatches
to do that for you.
> regmatches(x, m)
[[1]]
[1] "match"
[[2]]
[1] "match1" "match2"
[[3]]
character(0)
这篇关于R中一个字符串中的多个正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!