从正则表达式中提取捕获组匹配?(或:gregexec 在哪里?) [英] Extract capture group matches from regular expressions? (or: where is gregexec?)

查看：28 发布时间：2021/7/6 19:25:11 regex r backreference

本文介绍了从正则表达式中提取捕获组匹配?(或:gregexec 在哪里?)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定一个包含捕获组(括号)和一个字符串的正则表达式，如何我获得所有匹配捕获组的子串，即通常由\1"、\2"引用的子串?

Given a regular expression containing capture groups (parentheses) and a string, how can I obtain all the substrings matching the capture groups, i.e., the substrings usually referenced by "\1", "\2"?

示例:考虑一个捕获以xy"开头的数字的正则表达式:

Example: consider a regex capturing digits preceded by "xy":

s <- "xy1234wz98xy567"

r <- "xy(\\d+)"

想要的结果:

[1] "1234" "567"

第一次尝试:gregexpr:

regmatches(s,gregexpr(r,s))
#[[1]]
#[1] "xy1234" "xy567"

不是我想要的，因为它返回匹配整个模式的子字符串.

Not what I want because it returns the substrings matching the entire pattern.

第二次尝试:regexec:

regmatches(s,regexec("xy(\\d+)",s))
#[[1]]
#[1] "xy1234" "1234"

不是我想要的，因为它仅返回整个模式和捕获组匹配的第一次出现.

Not what I want because it returns only the first occurence of a matching for the entire pattern and the capture group.

如果有一个 gregexec 函数，将 regexec 扩展为 gregexpr extends regexpr，我的问题就解决了.

If there was a gregexec function, extending regexec as gregexpr extends regexpr, my problem would be solved.

所以问题是:如何在任意正则表达式中检索匹配捕获组的所有子字符串(或可以传递给 regmatches 的索引，如上例所示)?

So the question is: how to retrieve all substrings (or indices that can be passed to regmatches as in the examples above) matching capture groups in an arbitrary regular expression?

注意:上面给出的 r 的模式只是一个愚蠢的例子，它必须保持随意.

Note: the pattern for r given above is just a silly example, it must remain arbitrary.

推荐答案

不确定是否在 base 中执行此操作，但这里有一个可以满足您需求的包:

Not sure about doing this in base, but here's a package for your needs:

library(stringr)

str_match_all(s, r)
#[[1]]
#     [,1]     [,2]  
#[1,] "xy1234" "1234"
#[2,] "xy567"  "567"

许多stringr 函数在基本R 中也有相似之处，因此您也可以在不使用stringr 的情况下实现这一点.

Many stringr functions also have parallels in base R, so you can also achieve this without using stringr.

例如，这是上述工作方式的简化版本，使用基础 R:

For instance, here's a simplified version of how the above works, using base R:

sapply(regmatches(s,gregexpr(r,s))[[1]], function(m) regmatches(m,regexec(r,m)))

这篇关于从正则表达式中提取捕获组匹配?(或:gregexec 在哪里?)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从正则表达式中提取捕获组匹配?(或:gregexec 在哪里?) [英] Extract capture group matches from regular expressions? (or: where is gregexec?)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从正则表达式中提取捕获组匹配?(或:gregexec 在哪里?) [英] Extract capture group matches from regular expressions? (or: where is gregexec?)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭