PCRE:同时懒惰和贪婪(占有量词) [英] PCRE: Lazy and Greedy at the same time (Possessive Quantifiers)

查看:33
本文介绍了PCRE:同时懒惰和贪婪(占有量词)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一系列文本字符串与 PHP 上的 PCRE 进行匹配,但无法在第一个和第二个之间获得所有匹配项.

I am trying to match a series of text strings with PCRE on PHP, and am having trouble getting all the matches in between the first and second.

如果有人想知道我到底为什么要这样做,那是因为 Doc Comments.哦,我多么希望 Zend 能够使本机/插件函数能够从 PHP 文件中读取文档注释...

If anyone wonders why on Earth I would want to do this, it's because of Doc Comments. Oh, how I wish Zend would make native/plugin functions to read Doc Comments from a PHP file...

以下示例(纯文本)将用于解决问题.它将始终是纯 PHP 代码,文件开头只有一个开始标记,没有结束标记.您可以假设语法总是正确的.

The following example (plain) text will be used for the problem. It will always be pure PHP code, with only one opening tag at the beginning of the file, no closing. You can assume that the syntax will always be correct.

<?php
  class someClass extends someExample
  {
    function doSomething($someArg = 'someValue')
    {
      // Nested code blocks...
      if($boolTest){}
    }
    private function killFurbies(){}
    protected function runSomething(){}
  }

  abstract
  class anotherClass
  {
    public function __construct(){}
    abstract function saveTheWhales();
  }

  function globalFunc(){}

问题

试图匹配一个类中的所有方法;我的 RegEx 根本找不到方法 killFurbies().让它贪婪意味着它只匹配类中的最后一个方法,让它懒惰意味着它只匹配第一个方法.

Problem

Trying to match all methods in a class; my RegEx does not find the method killFurbies() at all. Letting it be greedy means it only matches the last method in a class, and letting it be lazy means it only matches the first method.

$part = '.*';  // Greedy
$part = '.*?'; // Lazy

$regex = '%class(?:\\n|\\r|\\s)+([a-zA-Z_\\x7f-\\xff][a-zA-Z0-9_\\x7f-\\xff]*)'
       . '.*?\{' . $part .'(?:(public|protected|private)(?:\\n|\\r|\\s)+)?'
       . 'function(?:\\n|\\r|\\s)+([a-zA-Z_\\x7f-\\xff][a-zA-Z0-9_\\x7f-\\xff'
       . ']*)(?:\\n|\\r|\\s)*\\(%ms';

preg_match_all($regex, file_get_contents(__EXAMPLE__), $matches, PREG_SET_ORDER);
var_dump($matches);

结果:

// Lazy:
array(2) {
  [0]=>
  array(4) {
    [0]=>
    // Omitted.
    [1]=>
    string(9) "someClass"
    [2]=>
    string(0) ""
    [3]=>
    string(11) "doSomething"
  }
  [1]=>
  array(4) {
    [0]=>
    // Omitted.
    [1]=>
    string(12) "anotherClass"
    [2]=>
    string(6) "public"
    [3]=>
    string(11) "__construct"
  }
}

// Greedy:
array(2) {
  [0]=>
  array(4) {
    [0]=>
    // Omitted.
    [1]=>
    string(9) "someClass"
    [2]=>
    string(0) ""
    [3]=>
    string(13) "saveTheWhales"
  }
  [1]=>
  array(4) {
    [0]=>
    // Omitted.
    [1]=>
    string(12) "anotherClass"
    [2]=>
    string(0) ""
    [3]=>
    string(13) "saveTheWhales"
  }
}

我如何匹配所有?:S

任何帮助将不胜感激,因为我已经觉得这个问题很荒谬,因为我正在输入它.任何试图回答这样的问题的人都比我勇敢!

Any help would be gratefully appreciated, as I already feel this question is ridiculous as I'm typing it out. Anyone attempting to answer a question like this is braver than me!

推荐答案

更好地使用 token_get_all 获取 PHP 代码的 tokens 并迭代它们.PHPDoc 样式注释 可以用 T_DOC_COMMENT 标识.

Better use token_get_all to get the tokens of a PHP code and iterate them. PHPDoc style comments tokens can be identified with T_DOC_COMMENT.

这篇关于PCRE:同时懒惰和贪婪(占有量词)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆