使用正则表达式匹配函数的主体 [英] Match the body of a function using Regex
问题描述
给出这样的伪函数:
public function handle()
{
if (isset($input['data']) {
switch($data) {
...
}
} else {
switch($data) {
...
}
}
}
我的目的是获取该函数的内容,问题在于匹配花括号{...}
的嵌套模式.
My intention is to get the contents of that function, the problem is matching nested patterns of curly braces {...}
.
我遇到了递归模式,但无法获取我的头围绕着与函数主体匹配的正则表达式.
I've come across recursive patterns but couldn't get my head around a regex that would match the function's body.
我尝试了以下操作(无递归):
I've tried the following (no recursion):
$pattern = "/function\shandle\([a-zA-Z0-9_\$\s,]+\)?". // match "function handle(...)"
'[\n\s]?[\t\s]*'. // regardless of the indentation preceding the {
'{([^{}]*)}/'; // find everything within braces.
preg_match($pattern, $contents, $match);
该模式根本不匹配.我确信这是最后一个错误的'{([^{}]*)}/'
,因为当体内没有其他支撑时,该模式有效.
That pattern doesn't match at all. I am sure it is the last bit that is wrong '{([^{}]*)}/'
since that pattern works when there are no other braces within the body.
通过将其替换为:
'{([^}]*)}/';
它匹配到if
语句内的开关的关闭}
并在此处停止(包括开关的}
,但不包括if
的开关).
It matched till the closing }
of the switch inside the if
statement and stopped there (including }
of the switch but excluding that of the if
).
与该模式一样,结果相同:
As well as this pattern, same result:
'{(\K[^}]*(?=)})/m';
推荐答案
更新#2
根据其他评论
^\s*[\w\s]+\(.*\)\s*\K({((?>"(?:[^"\\]*+|\\.)*"|'(?:[^'\\]*+|\\.)*'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})
注意:如果您知道您的输入中不包含PHP语法中的{
或}
,则使用简短的RegEx即{((?>[^{}]++|(?R))*)}
就足够了.
Note: A short RegEx i.e. {((?>[^{}]++|(?R))*)}
is enough if you know your input does not contain {
or }
out of PHP syntax.
- 在引号
["']
之间的字符串中包含 - 您的引号彼此之间都是转义的
- 您在注释栏中有
[{}]
.//...
或/*...*/
或#...
- 您在Heredoc或nowdoc
<<<STR
或<<<['"]STR['"]
中有
[{}]
[{}]
- You have
[{}]
in a string between quotation marks["']
- You have those quotation marks escaped inside one another
- You have
[{}]
in a comment block.//...
or/*...*/
or#...
- You have
[{}]
in a heredoc or nowdoc<<<STR
or<<<['"]STR['"]
否则,这意味着要有一对开/关牙套,而嵌套牙套的深度并不重要.
Otherwise it is meant to have a pair of opening/closing braces and depth of nested braces is not important.
否,除非您有一个火星人居住在您的代码中.
No unless you have a martian that lives inside your codes.
^ \s* [\w\s]+ \( .* \) \s* \K # how it matches a function definition
( # (1 start)
{ # opening brace
( # (2 start)
(?> # atomic grouping (for its non-capturing purpose only)
"(?: [^"\\]*+ | \\ . )*" # double quoted strings
| '(?: [^'\\]*+ | \\ . )*' # single quoted strings
| // .* $ # a comment block starting with //
| /\* [\s\S]*? \*/ # a multi line comment block /*...*/
| \# .* $ # a single line comment block starting with #...
| <<< \s* ["']? # heredocs and nowdocs
( \w+ ) # (3) ^
["']? [^;]+ \3 ; $ # ^
| [^{}<'"/#]++ # force engine to backtack if it encounters special characters [<'"/#] (possessive)
| [^{}]++ # default matching bahaviour (possessive)
| (?1) # recurse 1st capturing group
)* # zero to many times of atomic group
) # (2 end)
} # closing brace
) # (1 end)
格式化是通过@sln的 RegexFormatter 软件完成的.
Formatting is done by @sln's RegexFormatter software.
Laravel的口才 Model.php 文件(约3500行) )作为输入随机提供.一探究竟: 实时演示
Laravel's Eloquent Model.php file (~3500 lines) randomly is given as input. Check it out: Live demo
这篇关于使用正则表达式匹配函数的主体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!