R中一个字符串中的多个正则表达式 [英] Multiple regexpr in one string in R

查看:35
本文介绍了R中一个字符串中的多个正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个很长的字符串,我想处理多个匹配项.我似乎只能使用 regexpr 获得第一个匹配项的第一个位置.如何在同一个字符串中获得多个位置(更多匹配)?

So I have a really long string and I want to work with multiple matches. I can only seem to get the first position of the first match using regexpr. How can I get multiple positions (more matches) back within the same string?

我在 html 源代码中寻找特定的字符串.拍卖的标题(在 html 标签之间).证明有点难找:

I am looking for a specific string in html source code. The titel of an auction (which is between html tags). It prooves kind of difficult to find:

到目前为止我使用这个:

So far I use this:

locationstart <- gregexpr("<span class=\"location-name\">", URL)[[1]]+28
locationend <- regexpr("<", substring(URL, locationstart[1], locationend[1] + 100))
substring(URL, locationstart[1], locationstart[1] + locationend - 2)

也就是说,我寻找标题之前的部分,然后我捕捉那个地方,从那里寻找<"表示标题结束.我愿意提供更具体的建议.

That is, I look for a part that comes before a title, then I capture that place, from there on look for a "<" indicating that the title ended. I'm open for more specific suggestions.

推荐答案

使用 gregexpr 允许多个匹配.

Using gregexpr allows for multiple matches.

> x <- c("only one match", "match1 and match2", "none here")
> m <- gregexpr("match[0-9]*", x)
> m
[[1]]
[1] 10
attr(,"match.length")
[1] 5
attr(,"useBytes")
[1] TRUE

[[2]]
[1]  1 12
attr(,"match.length")
[1] 6 6
attr(,"useBytes")
[1] TRUE

[[3]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE

如果您想提取匹配项,您可以使用 regmatches 为您执行此操作.

and if you're looking to extract the match you can use regmatches to do that for you.

> regmatches(x, m)
[[1]]
[1] "match"

[[2]]
[1] "match1" "match2"

[[3]]
character(0)

这篇关于R中一个字符串中的多个正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆