正则表达式匹配多维字符串中的顶级定界符 [英] Regex to match top level delimiters in a multi dimensional string
问题描述
我有一个类似于json的大型多维结构文件,但是距离我不够远,无法使用json库.
I have a file that is structured in a large multidimensional structure, similar to json, but not close enough for me to use a json library.
数据看起来像这样:
alpha {
beta {
charlie;
}
delta;
}
echo;
foxtrot {
golf;
hotel;
}
我要构建的正则表达式(对于preg_match_all)应与每个顶级父级(由{}大括号分隔)匹配,以便我可以遍历匹配项,从而建立表示数据的多维php数组.
The regex I am trying to build (for a preg_match_all) should match each top level parent (delimited by {} braces) so that I can recurse through the matches, building up a multidimensional php array that represents the data.
我尝试的第一个正则表达式是/(?<=\{).*(?=\})/s
,它与括号内的内容贪婪地匹配,但是这并不正确,因为在顶层有多个同级时,匹配太贪婪了.下面的示例:
The first regex I tried is /(?<=\{).*(?=\})/s
which greedily matches content inside braces, however this isn't quite right as when there is more than one sibling in the top level the match is too greedy. Example below:
使用正则表达式/(?<=\{).*(?=\})/s
匹配表示为:
Using regex /(?<=\{).*(?=\})/s
match is given as:
匹配1:
beta {
charlie;
}
delta;
}
echo;
foxtrot {
golf;
hotel;
相反,结果应为: 比赛1:
Instead the result should be: Match 1:
beta {
charlie;
}
delta;
匹配2:
golf;
hotel;
所以正则表达式向导,我在这里缺少什么功能,或者我需要以某种方式用php解决这个问题?任何提示都非常欢迎:)
So regex wizards, what function am I missing here or do I need to solve this with php somehow? Any tips very welcome :)
推荐答案
您不能 1 使用正则表达式来做到这一点.
You can't 1 do this with regular expressions.
或者,如果要匹配深浅块,则可以使用\{[^\{\}]*?\}
和preg_replace_callback()
存储值,然后返回null
从字符串中删除它.回调将需要相应地嵌套值.
Alternatively, if you want to match deep-to-shallow blocks, you can use \{[^\{\}]*?\}
and preg_replace_callback()
to store the value, and return null
to erase it from the string. The callback will need to take care of nesting the value accordingly.
$heirarchalStorage = ...;
do {
$string = \preg_replace_callback('#\{[^\{\}]*?\}#', function($block)
use(&$heirarchalStorage) {
// do your magic with $heirarchalStorage
// in here
return null;
}, $string);
} while (!empty($string));
不完整,未经测试,也不提供保修.
此方法还要求将字符串也包裹在{}
中,否则最终匹配将不会发生,并且您将永远循环.
This approach requires that the string be wrapped in {}
as well, otherwise the final match won't happen and you'll loop forever.
这是一项糟糕的工作(效率低下),这些工作可以通过众所周知的交换/存储格式(例如JSON)轻松解决.
This is an awful lot of (inefficient) work for something that can just as easily be solved with a well known exchange/storage format such as JSON.
1 我要输入"您可以,但是... ",但是我再说一遍," 您不能 " 2
1 I was going to put "you can, but...", however I'll just say once again, "You can't" 2
2 不要
这篇关于正则表达式匹配多维字符串中的顶级定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!