递归/子例程正则表达式以匹配CSS媒体查询 [英] Recursive/subroutine regex to match CSS media queries

查看:113
本文介绍了递归/子例程正则表达式以匹配CSS媒体查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个正则表达式(在PHP PCRE中),该正则表达式可以可靠地匹配媒体查询及其内容,包括媒体查询主体为空的情况有些奇怪.源文本可能是:

I'm looking for a regular expression (in PHP PCRE) that can match media queries and their contents reliably, including the somewhat odd case where a media query body is empty. Source text might be:

@media only screen {
    p {
        color:red;
    }
}
@media only screen and (max-width: 596px) {
    p {
        color:blue;
    }
    img {
        max-width: 200px;
    }
}
@media only screen {

}
img {
    display: block;
}
@media only screen and (max-width: 240px) {
    p {
        color:green;
    }
}
p {
    font-weight: normal;
}

我想将每个媒体查询及其CSS主体捕获为子模式,因此最终得到一个PHP数组,如下所示:

I want to capture each media query and its CSS body as subpatterns, so I'll end up with a PHP array like:

[['@media only screen {
        p {
            color:red;
        }
    }','p {
            color:red;
        }'],...]

关键是,这必须是递归或子例程模式才能平衡花括号.空查询足以混淆

The key thing is that this needs to be a recursive or subroutine pattern in order to balance the braces. The empty query is enough to confuse the pattern in this question because it can't distinguish the end of a css rule from the end of the empty media query:

/@media[^{]+\{([\s\S]+?\})\s*\}/

我一直在尝试并且未能使用本文中的建议形成形式为(b(?:m|(?1))*e)的模式,其中b是开始构造的地方,m是可能在构造的中间发生的事情,而e是可能在构造的末尾发生的事情,都不存在可以匹配同一件事.

I've been trying and failing to use the advice in this article to make a pattern of the form (b(?:m|(?1))*e), where b is what begins the construct, m is what can occur in the middle of the construct, and e is what can occur at the end, and none of them can match the same thing.

因此,b应该是@media[^{]+\{e应该是\},并且m需要消耗CSS规则,也许是([^{]+?\{[^}]*?\s*\}),给我:

So, b should be @media[^{]+\{, e should be \}, and m needs to consume CSS rules, perhaps ([^{]+?\{[^}]*?\s*\}), giving me:

/(@media[^{]+\{(?:([^{]+?\{[^}]*?\}\s*)*|(?1))*\})/s

但是,这不起作用,所以我有点迷路了.有人可以提出有效的模式吗?

However, that doesn't work so I'm a bit lost. Can anyone suggest an effective pattern?

我已经在此处进行了正则表达式测试.

I've set up a regex test here.

或者,非正则表达式解析器可能会更好.

Alternatively, a non-regex parser might work better.

请注意,我一般不会尝试验证或匹配CSS选择器(不是用于正则表达式的工作),而只是获取查询及其主体的内容.

Note that I'm not attempting to validate or match CSS selectors in general (not a job for a regex), just grab the content of the query and its body.

更新添加了更多示例内容,解释了我想了解的内容.

Update added more sample content, explained what I want to get out.

推荐答案

如果您确定要匹配的块始终具有平衡的大括号,则可以将正则表达式与如下子程序一起使用:

If you are sure the blocks you want to match always have a balanced number of braces, you can use a regex with subroutine like this:

'~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'

请参见 regex演示

这是一个 IDEONE演示:

$re = '~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'; 
$str = "@media only screen {\n    p {\n        color:red;\n    }\n}\n@media only screen and (max-width: 596px) {\n    p {\n        color:blue;\n    }\n    img {\n        max-width: 200px;\n    }\n}\n@media only screen {\n\n}\nimg {\n    display: block;\n}\n@media only screen and (max-width: 240px) {\n    p {\n        color:green;\n    }\n}\np {\n    font-weight: normal;\n}"; 
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER);
print_r($matches[0]);
print_r($matches[2]);

模式详细信息:

  • @media\b-将@media整个单词匹配(因为\b是单词边界)
  • [^{]*-匹配除{
  • 之外的0+个字符
  • ({((?:[^{}]+|(?1))*)})-捕获组#1捕获均衡数量的{}{...}块(请注意,这是一个技术组,我们需要递归此组子模式才能正确匹配{...}).它匹配...
    • {-大括号
    • ((?:[^{}]+|(?1))*)-组2(平衡的{...}内部的内容)匹配
      • [^{}]+-{}以外的1个以上字符(因为我们需要匹配不是前导和尾随定界符的所有字符)
      • |-或...
      • (?1)-整个第1组子模式
      • @media\b - match @media as a whole word (since \b is a word boundary)
      • [^{]* - match 0+ characters other than {
      • ({((?:[^{}]+|(?1))*)}) - a capturing group #1 capturing the {...} blocks with the balanced number of { and } (note it is a technical group, we need to recurse this group subpattern in order to correctly match the {...}s). It matches...
        • { - an opening brace
        • ((?:[^{}]+|(?1))*) - Group 2 (the contents inside the balanced {...}) matching
          • [^{}]+ - 1+ characters other than { and } (because we need to match everything that is not the leading and trailing delimiters)
          • | - or...
          • (?1) - the whole Group 1 subpattern

          请注意,可以使用 preg_match_all('~\s*(\w+)\s*{\s*([^}]*?)\s*}~', $matches[2], $subblocks) 模式对$matches[2]进行进一步处理.

          Note that $matches[2] can be further processed with preg_match_all('~\s*(\w+)\s*{\s*([^}]*?)\s*}~', $matches[2], $subblocks) pattern.

          这篇关于递归/子例程正则表达式以匹配CSS媒体查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆