用正则表达式编写PHP查询解析器 [英] Writing a PHP query parser with regular expressions

查看:106
本文介绍了用正则表达式编写PHP查询解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在PHP中编写正确的正则表达式以解析字符串(由某些用户编写)以建立请求.它可能像这样复杂:

I'm trying to write the correct regular expression in PHP to parse a string (written by some user) to build a request. It can be as complex as :

name = 'benjo' and (surname = 'benny' or surname = 'bennie') or age = 4

稍后,我将解析字符串以构建mySQL查询.现在,我只是试图找到正确的正则表达式,以将该字符串解析为一个看起来像这样的数组:

Later I'll parse the string to build mySQL queries. For now, I'm just trying to find the correct regular expression to parse this string into an array that could look like :

$result = array(
0 => name = 'benjo',
1 => and
2 => array(
    0 => surname = 'benny',
    1 => or,
    2 => surname = 'bennie',
    ),
3 => age = 4
);

我已经考虑过使用递归函数,现在我的正则表达式是:

I've thought about using recursive functions, and my regular expression for now is :

"#\((([^()]+|(?R))*)\)|(ou[^()])|(et[^()])#",

那当然是行不通的.

如果有人可以帮助我,我会很高兴,我有点卡在这里了! :) ks 罗曼(Romain)

I'll be glad if someone could help, I'm getting kinda stuck here ! :) Tks, Romain

让我们改变挑战! :) 好的,现在让我们更简单一些.使用正则表达式并添加我们停留在一级"上的约束会发生什么!!没有嵌套的括号,只有一个级别,但是仍然有许多AND/OR ...这会改变有利于还是REGEXP的东西吗? (我真的很想避免编写我的迷你解析器,尽管听起来真的很有趣...

LET'S CHANGE THE CHALLENGE ! :) OK, now let's make it a bit more simple. What would it take with a regular expression and adding the constraint that we stay on "level one" !! No nested parenthesis, just one level, but still as many AND/ORs... Would that change anything in favor or REGEXPs ? (I really would like to avoid writing my mini parser although that sounds really interesting...

推荐答案

理论正则表达式的功能不足以进行括号匹配.理论正则表达式只能处理左递归/右递归规则.中间递归规则不能用正则表达式(例如<exp> -> "(" <exp> ")")表示.

Theoretical regular expression is not powerful enough to do parentheses matching. Theoretical regular expression can only take care of left recursion/right recursion rules. Middle recursion rules is cannot be expressed with regular expression (e.g. <exp> -> "(" <exp> ")").

以编程语言编写的正则表达式实现了使正则表达式超出常规语法功能的功能.例如,正则表达式中的反向引用允许编写符合

Regex in programming languages, however, implements features which allow regex to exceed the power of regular grammar. For example, backreference in regex allows one to write a regex which matches a non context-free languages. However, even with backreference, it's still not possible to balance parentheses with regex.

由于PCRE库通过子例程调用功能支持递归正则表达式,因此在技术上可以使用正则表达式解析此类表达式.但是,除非您可以自己编写正则表达式,这意味着您了解自己在做什么并且可以修改正则表达式以适合您的需求,您应该只编写自己的解析器 .否则,您将陷入无法维持的混乱局面.

As PCRE library supports recursive regex via subroutine call feature, it is technically possible to parse such an expression with regex. However, unless you can write the regex yourself, which means that you understand what you are doing and can modify the regex to suit your needs, you should just write your own parser. Otherwise, you will end up with an unmaintainable mess.

(?(DEFINE)
  (?<string>'[^']++')
  (?<int>\b\d+\b)
  (?<sp>\s*)
  (?<key>\b\w+\b)
  (?<value>(?&string)|(?&int))
  (?<exp>(?&key) (?&sp) = (?&sp) (?&value))
  (?<logic>\b (?:and|or) \b)
  (?<main>
    (?<token> \( (?&sp) (?&main) (?&sp) \) | (?&exp) )
    (?:
      (?&sp) (?&logic) (?&sp)
      (?&token) 
    )*
  )
)
(?:
  ^ (?&sp) (?= (?&main) (?&sp) $ )
  |
  (?!^) \G
  (?&sp) (?&logic) (?&sp)
)
(?:
  \( (?&sp) (?<m_main>(?&main)) (?&sp) \)
  |
  (?<m_key>(?&key)) (?&sp) = (?&sp) (?<m_value>(?&value))
)

在regex101上演示

上面的正则表达式应与preg_match_all一起使用,并置于带有x标志的定界符之间(自由间距模式):/.../x.

The regex above should be use with preg_match_all, and placed between delimiter with x flag (free spacing mode): /.../x.

对于每场比赛:

  • 如果m_main捕获组中有内容,则将内容进行另一轮匹配.
  • 否则,获取m_keym_value捕获组中的键和值.
  • If m_main capturing group has content, put the content through another round of matching.
  • Otherwise, get the key and value in m_key and m_value capturing group.

通过(?(DEFINE)...)块,您可以定义命名的捕获组,以与主模式分开地用于子例程调用.

The (?(DEFINE)...) block allows you to define named capturing groups for use in subroutine calls separately from the main pattern.

(?(DEFINE)
  (?<string>'[^']++')  # String literal
  (?<int>\b\d+\b)      # Integer
  (?<sp>\s*)           # Whitespaces between tokens
  (?<key>\b\w+\b)      # Field name
  (?<value>(?&string)|(?&int)) # Field value
  (?<exp>(?&key) (?&sp) = (?&sp) (?&value)) # Simple expression
  (?<logic>\b (?:and|or) \b) # Logical operators
  (?<main>             # <token> ( <logic> <token> )*
    # A token can contain a simple expression, or open a parentheses (...)
    # When we open a parentheses, we recurse into the main pattern again
    (?<token> \( (?&sp) (?&main) (?&sp) \) | (?&exp) )
    (?:
      (?&sp) (?&logic) (?&sp)
      (?&token) 
    )*
  )
)

其余模式为

The rest of the pattern is based on this technique to match all <token>s in <token> ( <logic> <token> )* with global matching operation.

正则表达式的最后一部分(可以写为(?&token))被扩展为与简单表达式中的字段名称和值匹配.

The last part of the regex, while can be written as (?&token), is expanded to match the field name and value in the simple expressions.

(?:
  \( (?&sp) (?<m_main>(?&main)) (?&sp) \)
  |
  (?<m_key>(?&key)) (?&sp) = (?&sp) (?<m_value>(?&value))
)

这篇关于用正则表达式编写PHP查询解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆