正则表达式在 PHP 中重复捕获组 [英] Regex repeated capturing group in PHP
问题描述
我正在尝试从一个带有路由的文件中获取信息,因此对于这项工作,我选择了正则表达式,但是我对重复的信息有问题,为了更好地提出问题,我将把我所拥有的和我想要的有:
i'm trying to get information from one file with routes, so for this work i chose regex, but i have the problem with the repeted information, for do a better question i will put what i have, and what i want to have:
所以我有一个文件:
Codes: C - Connected, S - Static, R - RIP, B - BGP,
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
U - Unreachable, i - Inactive
O E 0.0.0.0/0 via 10.140, bond1.30, cost 1:10, age 5
via 10.141, bond1.31
via 10.142, bond1.32
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O IA 10.138/29 via 10.140, bond1.30, cost 46, age 1029440
C 10.141/29 is directly connected, bond2.35
C 10.141/29 is directly connected, bond2.35
我做了这个正则表达式:
And i made this regex:
(S|R|B|O|A|K|H|P|U|i) +(IA|E|N|) +([0-9.]+)\/([0-9]+) +via +([0-9.]+), +([a-zA-Z0-9.]+|), +cost +([0-9]+:|)([0-9]+), +age +[0-9]+ +\n(( +via +([0-9.]+), +([a-zA-Z0-9.]+|) +\n)+|)
我的问题是结尾部分 (( +via +([0-9.]+), +([a-zA-Z0-9.]+|) +\n)+|)
因为这个正则表达式让我得到这个
My problem is with the end part (( +via +([0-9.]+), +([a-zA-Z0-9.]+|) +\n)+|)
because this regex get me this
array[0]=>' via 10.141, bond1.31
via 10.142, bond1.32';
array[1]=>'10.142';
array[2]=>'bond1.32';
但我想得到
array[0]=>'10.141';
array[1]=>'bond1.31';
array[3]=>'10.142';
array[4]=>'bond1.32';
我在关于正则表达式的页面中测试了正则表达式,其中一个告诉我:
I test the regex in pages about regex and one of them tell me this:
注意:重复的捕获组只会捕获最后一次迭代.在重复组周围放置一个捕获组以捕获所有迭代或使用非捕获组代替,如果你不是对数据感兴趣
Note:A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
但我真的不知道这是什么意思以及如何解决它.
But i really dont know what is the mean of this and how to fix it.
注意:这是为了获取文件是关于cisco中的路由,带有show ip route
Note: this is for get the file is about routes in cisco with show ip route
更新 1
我将正则表达式更改为
(S|R|B|O|A|K|H|P|U|i) +(IA|E|N|) +([0-9.]+)\/([0-9]+) +via +([0-9.]+), +([a-zA-Z0-9.]+|), +cost +([0-9]+:|)([0-9]+), +age +[0-9]+ +\n(?: +via +([0-9.]+), +([a-zA-Z0-9.]+|) +\n)*
这样我就没有
array[0]=>' via 10.141, bond1.31
via 10.142, bond1.32';
但是我没有重复的部分
推荐答案
好的,我把你的正则表达式改成这样:
Ok I've changed your regexp like this:
$txt = "Codes: C - Connected, S - Static, R - RIP, B - BGP,
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
U - Unreachable, i - Inactive
O E 0.0.0.0/0 via 10.140, bond1.30, cost 1:10, age 5
via 10.141, bond1.31
via 10.142, bond1.32
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O IA 10.138/29 via 10.140, bond1.30, cost 46, age 1029440
C 10.141/29 is directly connected, bond2.35
C 10.141/29 is directly connected, bond2.35
";
$regexp = '#(.) +([A-Z]{1,2}) +([\d.]+/\d+) +via ([\d.]+), ([a-zA-Z0-9.]+), cost [\d:]+, age \d+ +(?:\n +via ([\d.]+), ([a-zA-Z0-9.]+))*#m';
$matches = [];
preg_match_all($regexp, $txt, $matches, PREG_SET_ORDER);
var_dump($matches);
这是输出:
array(4) {
[0] =>
array(8) {
[0] =>
string(125) "O E 0.0.0.0/0 via 10.140, bond1.30, cost 1:10, age 5
via 10.141, bond1.31"
[1] =>
string(1) "O"
[2] =>
string(1) "E"
[3] =>
string(9) "0.0.0.0/0"
[4] =>
string(6) "10.140"
[5] =>
string(8) "bond1.30"
[6] =>
string(6) "10.141"
[7] =>
string(8) "bond1.31"
}
[1] =>
array(6) {
[0] =>
string(69) "O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511 "
[1] =>
string(1) "O"
[2] =>
string(1) "E"
[3] =>
string(9) "10.112/23"
[4] =>
string(6) "10.140"
[5] =>
string(8) "bond1.30"
}
[2] =>
array(6) {
[0] =>
string(69) "O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511 "
[1] =>
string(1) "O"
[2] =>
string(1) "E"
[3] =>
string(9) "10.112/23"
[4] =>
string(6) "10.140"
[5] =>
string(8) "bond1.30"
}
[3] =>
array(6) {
[0] =>
string(70) "O IA 10.138/29 via 10.140, bond1.30, cost 46, age 1029440 "
[1] =>
string(1) "O"
[2] =>
string(2) "IA"
[3] =>
string(9) "10.138/29"
[4] =>
string(6) "10.140"
[5] =>
string(8) "bond1.30"
}
}
它不起作用,因为缺少第三个过孔
新版本,逐行:
$txt = "Codes: C - Connected, S - Static, R - RIP, B - BGP,
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
U - Unreachable, i - Inactive
O E 0.0.0.0/0 via 10.140, bond1.30, cost 1:10, age 5
via 10.141, bond1.31
via 10.142, bond1.32
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O IA 10.138/29 via 10.140, bond1.30, cost 46, age 1029440
C 10.141/29 is directly connected, bond2.35
C 10.141/29 is directly connected, bond2.35
";
$grouped = [];
$i = 0;
foreach (explode("\n", $txt) as $line) {
$matches = [];
if (preg_match('#^(.) +([A-Z]{1,2}) +([\d.]+/\d+) +via ([\d.]+), ([a-zA-Z0-9.]+)#', $line, $matches)) {
array_shift($matches);
$grouped[++$i] = $matches;
} else if(preg_match('#^ +via ([\d.]+), ([a-zA-Z0-9.]+)#', $line, $matches)){
array_push($grouped[$i], $matches[1], $matches[2]);
}
}
var_dump($grouped);
现在它正在工作:
array(4) {
[1] =>
array(9) {
[0] =>
string(1) "O"
[1] =>
string(1) "E"
[2] =>
string(9) "0.0.0.0/0"
[3] =>
string(6) "10.140"
[4] =>
string(8) "bond1.30"
[5] =>
string(6) "10.141"
[6] =>
string(8) "bond1.31"
[7] =>
string(6) "10.142"
[8] =>
string(8) "bond1.32"
}
[2] =>
array(5) {
[0] =>
string(1) "O"
[1] =>
string(1) "E"
[2] =>
string(9) "10.112/23"
[3] =>
string(6) "10.140"
[4] =>
string(8) "bond1.30"
}
[3] =>
array(5) {
[0] =>
string(1) "O"
[1] =>
string(1) "E"
[2] =>
string(9) "10.112/23"
[3] =>
string(6) "10.140"
[4] =>
string(8) "bond1.30"
}
[4] =>
array(5) {
[0] =>
string(1) "O"
[1] =>
string(2) "IA"
[2] =>
string(9) "10.138/29"
[3] =>
string(6) "10.140"
[4] =>
string(8) "bond1.30"
}
}
这篇关于正则表达式在 PHP 中重复捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!