pcre2_substitute()函数支持多少个捕获的组? [英] How many captured groups are supported by pcre2_substitute() function?

查看:33
本文介绍了pcre2_substitute()函数支持多少个捕获的组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的c ++项目中使用 pcre2_substitute()函数执行正则表达式替换:

I am using pcre2_substitute() function in my c++ project to perform regex replace:

int ret=pcre2_substitute(
  re,                    /*Points to the compiled pattern*/
  subject,               /*Points to the subject string*/
  subject_length,        /*Length of the subject string*/
  0,                     /*Offset in the subject at which to start matching*/
  rplopts,               /*Option bits*/
  0,                     /*Points to a match data block, or is NULL*/
  0,                     /*Points to a match context, or is NULL*/
  replace,               /*Points to the replacement string*/
  replace_length,        /*Length of the replacement string*/
  output,                /*Points to the output buffer*/
  &outlengthptr          /*Points to the length of the output buffer*/
);

这是函数的手册页.并没有说有多少个被捕获的团体是可能的.我已经测试过 $ 01 $ {6} $ 12 是否有效,但是限制是多少?

This is the man page of the function. It doesn't say how many captured groups are possible. I have tested that $01, ${6}, $12 works, but what is the limit?

我检查了是否有数字限制,例如C ++ std :: regex ,但没有. $ 000000000000001 用作 $ 1 ,而在 std :: regex 中则表示 $ 00 ,其余部分视为字符串

I checked if there's a digit limit like the C++ std::regex, but there isn't. $000000000000001 works as $1 while in std::regex it would mean $00 and the rest would be treated as string.

我用于测试的代码是此代码.您将需要 pcre2 库来运行此代码.

The code I am using for testing is this one. You will need pcre2 library to run this code.

推荐答案

捕获组的最大数量为 65,535 .这也是可以在模式或替换中反向引用的最大组号.

The maximum number of capturing groups is 65,535. And this is also the maximum group number that can be backreferenced in the pattern or in the replacement.

但是,一般来说,一场比赛可能会在允许大量分组之前达到另一个限制:主题字符串的最大长度,或内部(整体或递归)调用 match()的次数,尽管可以提高匹配限制.有关匹配限制的详细信息,请参阅"匹配上下文"在 pcre2api 中.

However, generally speaking, a match will probably reach another limit before allowing that big amount of groups: e.g. the maximum length of the subject string, or the number of times match() is called internally (in total, or recursively), though match limits can be increased. For detailed information about match limits, see "The match context" in pcre2api.

带括号的子模式的数量没有限制,但是捕获子模式最多只能有65,535个.

There is no limit to the number of parenthesized subpatterns, but there can be no more than 65,535 capturing subpatterns.

但是,括号的嵌套深度是有限制的各种子模式.这样做是为了限制编译时使用的系统堆栈.该限制可以在PCRE2时指定建成;默认值为250.

There is, however, a limit to the depth of nesting of parenthesized subpatterns of all kinds. This is imposed in order to limit the amount of system stack used at compile time. The limit can be specified when PCRE2 is built; the default is 250.

已命名子模式的最大数量为10,000.

The maximum number of named subpatterns is 10,000.

作者:菲利普·黑泽尔(Philip Hazel).上次更新时间:2014年11月25日.-*自PCRE2版本10.20起

PCRE和PCRE2具有相同的限制:

PCRE and PCRE2 have the same limits:

  • 重复量化符中的所有值均限制为65,535.

  • All values in repeating quantifiers are limited to 65,535.

无限数量的带括号的子模式
(尽管限于各种带括号的子模式的嵌套深度).

Unlimited number of parenthesized subpatterns
(though it's limited to the depth of nesting of parenthesized subpatterns of all kinds).

65,535 捕获子模式.

10,000个命名子模式.

默认的最大嵌套括号深度为250
( PCRE2_CONFIG_PARENSLIMIT 的值).

The default maximum depth of nested parentheses is 250
(value of PCRE2_CONFIG_PARENSLIMIT).

已命名子模式的名称的最大长度为32个代码单元.
一个字符由1+ 代码单元表示(取决于编码).例如.在UTF-8中,Ç"具有2个代码单元:0xC3 0x87

The maximum length of names for named subpattern is 32 code units.
A char is represented by 1+ code units (depending on encoding). E.g. in UTF-8 "Ç" has 2 code units: 0xC3 0x87

向后引用的数量没有限制.

对后继子模式的前向引用的数量限制为20万左右.

The limit to the number of forward references to subsequent subpatterns is around 200,000.

控制动词中使用的名称限制为255(8位)和65,535(16或32位).

Names used in control verbs are limited to 255 (8-bit) and 65,535 (16 or 32-bit).

PCRE2_CONFIG_MATCHLIMIT 的默认值为10,000,000(10m).

The default value for PCRE2_CONFIG_MATCHLIMIT is 10,000,000 (10m).

PCRE2_CONFIG_RECURSIONLIMIT 的默认值为10,000,000(10m).
(此限制仅在设置为小于 MATCH_LIMIT 时才适用.)

The default value for PCRE2_CONFIG_RECURSIONLIMIT is 10,000,000 (10m).
(this limit only applies if it's set smaller than MATCH_LIMIT).

如果使用默认内部链接大小2进行编译,则编译模式的最大长度为64K代码单元(请参见

The maximum length of a compiled pattern is 64K code units if compiled with the default internal linkage size of 2 (see the pcre2build documentation for details).

主题字符串的最大长度是整数变量可以容纳的最大正数(可能为〜1.8E + 19).但是,可用的堆栈空间可能会限制某些模式可以处理的主题字符串的大小.
主题字符串的最大长度(以代码为单位)比 PCRE2_SIZE 变量可以容纳的最大长度小一. PCRE2_SIZE 是无符号整数类型,通常定义为 size_t .

The maximum length of a subject string is the largest positive number that an integer variable can hold (may be ~1.8E+19). However, the available stack space may limit the size of a subject string that can be processed by certain patterns.
The maximum length (in code units) of a subject string is one less than the largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned integer type, usually defined as size_t.

这篇关于pcre2_substitute()函数支持多少个捕获的组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆