递归正则表达式与boost匹配 [英] Recursive regular expression match with boost

查看:64
本文介绍了递归正则表达式与boost匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了C ++标准正则表达式库无法编译递归正则表达式的问题.

I got a problem with C++ standard regex library not compiling recursive regex.

在Internet上查找时,我发现这是一个众所周知的问题,人们建议使用Boost库.这是被牵连的一个:

Looking up on the internet I found out it's a well known problem and people suggest using boost library. This is the incriminated one :

\\((?>[^()]|(?R))*\\)|\\w+

我想要做的基本上是使用此正则表达式根据空格和方括号(包括方括号内的平衡方括号)来拆分语句,但是每一段显示如何使用boost进行操作的代码都无法正常工作我不知道为什么.预先感谢.

What I'm trying to do is basically using this regex to split statements according to spaces and brackets (including the case of balanced brackets inside brackets) but every piece of code showing how to do it using boost doesn't work properly and I don't know why. Thanks in advance.

推荐答案

您可以使用 R(...)" 语法,使用原始字符串文字声明正则表达式.这样,您就不必两次转义反斜杠.

You may declare the regex using a raw string literal, using R"(...)" syntax. This way, you won't have to escape backslashes twice.

cf.,这些是相等的声明:

Cf., these are equal declarations:

std::string my_pattern("\\w+");
std::string my_pattern(R"(\w+)");

括号不是正则表达式模式的一部分,它们是原始字符串文字定界符部分.

The parentheses are not part of the regex pattern, they are raw string literal delimiter parts.

但是,您的正则表达式不太正确:您只需要递归第一个替代方案,而不必递归整个正则表达式.

However, your regex is not quite correct: you need to recurse only the first alternative and not the whole regex.

这里是解决方法:

std::string my_pattern(R"((\((?:[^()]++|(?1))*\))|\w+)");

在这里,(\((?:[^()] ++ |(?1))* \))匹配和除()或使用(?1) 正则表达式子例程 .

Here, (\((?:[^()]++|(?1))*\)) matches and 1+ chars other than ( and ) or recurses the whole Group 1 pattern with (?1) regex subroutine.

请参见 regex演示.

这篇关于递归正则表达式与boost匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆