如何使用PHP解析异构标记? [英] How to parse heterogenous markup with PHP?

查看:91
本文介绍了如何使用PHP解析异构标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有自定义标记的字符串,用于保存带有和弦,制表符,音符等的歌曲.它包含

I have a string with custom markup for saving songs with chords, tabulatures, notes etc. It contains

各种括号中的内容:\[.+?\]\[[.+?\]]\(.+?\)
箭头:<-{3,}>\-{3,}><\-{3,}
等等...

things in various brackets: \[.+?\], \[[.+?\]], \(.+?\)
arrows: <-{3,}>, \-{3,}>, <\-{3,}
and so on...

示例文字可能是

Text Text [something]
--->
Text (something 021213)

现在我希望将标记解析为令牌数组,即对应类的对象,看起来像(括号中的匹配部分)

Now I wish to parse the markup into array of tokens, objects of corresponding classes, which would look like (matched parts in brackets)

ParsedBlock_Text ("Text Text ")
ParsedBlock_Chord ("something")
ParsedBlock_Text (" ")
ParsedBlock_NewColumn
ParsedBlock_Text (" text ")
ParsedBlock_ChordDiagram ("something 021213")

我知道如何匹配它们,但是要么我必须匹配每个不同的模式,然后保存偏移量以对数组进行正确排序,要么我一次匹配了它们,但我不知道匹配哪个.

I know how to match them, but either I must match each different pattern, and save offsets to properly sort the array, or I match them at once and I don't know which one has been matched.

谢谢,MK

推荐答案

假设您不尝试嵌套这些结构,则将标记您的文本:

Assuming you do not try to nest these structures, this will tokenize your text:

function ParseText($text) {
    $re = '/\[\[(?P<DoubleBracket>.*?)]]|\[(?P<Bracket>.*?)]|\((?P<Paren>.*?)\)|(?<Arrow><---+>?|---+>)/s';
    $keys = array('DoubleBracket', 'Bracket', 'Paren', 'Arrow');
    $result = array();
    $lastStart = 0;
    if (preg_match_all($re, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
        foreach ($matches as $match) {
            $start = $match[0][1];
            $prefix = substr($text, $lastStart, $start - $lastStart);
            $lastStart = $start + strlen($match[0][0]);
            if ($prefix != '' && !ctype_space($prefix)) {
                $result []= array('Text', trim($prefix));
            }
            foreach ($keys as $key) {
                if (isset($match[$key]) && $match[$key][1] >= 0) {
                    $result []=  array($key, $match[$key][0]);
                    break;
                }
            }
        }
    }
    $prefix = substr($text, $lastStart);
    if ($prefix != '' && !ctype_space($prefix)) {
        $result []= array('Text', trim($prefix));
    }
    return $result;
}

示例:

$mytext = <<<'EOT'
Text Text [something]
--->
Text (something 021213)
More Text
EOT;

$parsed = ParseText($mytext);
foreach ($parsed as $item) {
    print_r($item);
}

输出:

Array
(
    [0] => Text
    [1] => Text Text
)
Array
(
    [0] => Bracket
    [1] => something
)
Array
(
    [0] => Arrow
    [1] => --->
)
Array
(
    [0] => Text
    [1] => Text
)
Array
(
    [0] => Paren
    [1] => something 021213
)
Array
(
    [0] => Text
    [1] => More Text
)

http://ideone.com/kJQrBw

如果要向正则表达式添加更多模式,请确保在开始时放置更长的模式,以免将它们错误地匹配为错误的类型.

If you want to add more patterns to the regex, make sure you put longer patterns at the start, so they are not mistakenly matched as the wrong type.

这篇关于如何使用PHP解析异构标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆