pyparsing nestedExpr和嵌套括号 [英] pyparsing nestedExpr and nested parentheses

查看:167
本文介绍了pyparsing nestedExpr和嵌套括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一种非常简单的查询语法",供具有合理技术技能的人员使用(即,本身不是编码人员,但可以触及该主题)

他们将在表单上输入的典型示例是:

address like street
AND
vote =  True
AND
(
  (
    age>=25
    AND
    gender = M
  )
  OR
  (
    age between [20,30]
    AND
    gender = F
  )
  OR
  (
    age >= 70
    AND
    eyes != blue
  )
)

使用

  1. 无需报价
  2. 括号内的潜在无限嵌套
  3. 简单的AND | OR链接

我正在使用pyparsing(嗯,无论如何尝试)并达到某些目标:

from pyparsing import *

OPERATORS = [
    '<',
    '<=',
    '>',
    '>=',
    '=',
    '!=',
    'like'
    'regexp',
    'between'
]

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

# user_input is the text sent by the client form
user_input = ' '.join(user_input.split())
user_input = '(' + user_input + ')'

AND = Keyword("AND").setName('AND')
OR = Keyword("OR").setName('OR')

FIELD = Word(alphanums).setName('FIELD')
OPERATOR = oneOf(OPERATORS).setName('OPERATOR')
VALUE = Word(unicode_printables).setName('VALUE')
CRITERION = FIELD + OPERATOR + VALUE

QUERY = Forward()
NESTED_PARENTHESES = nestedExpr('(', ')')
QUERY << ( CRITERION | AND | OR | NESTED_PARENTHESES )

RESULT = QUERY.parseString(user_input)
RESULT.pprint()

输出为:

[['address',
  'like',
  'street',
  'AND',
  'vote',
  '=',
  'True',
  'AND',
  [['age>=25', 'AND', 'gender', '=', 'M'],
   'OR',
   ['age', 'between', '[20,30]', 'AND', 'gender', '=', 'F'],
   'OR',
   ['age', '>=', '70', 'AND', 'eyes', '!=', 'blue']]]]

我只对部分内容感到满意-主要原因是所需的最终输出看起来像这样:

[
  {
    "field" : "address",
    "operator" : "like",
    "value" : "street",
  },
  'AND',
  {
    "field" : "vote",
    "operator" : "=",
    "value" : True,
  },
  'AND',
  [
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 25,
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "M",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : "between",
        "value" : [20,30],
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "F",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 70,
      },
      'AND'
      {
        "field" : "eyes",
        "operator" : "!=",
        "value" : "blue",
      }
    ],
  ]
]

非常感谢!

编辑

在Paul回答之后,这就是代码的样子.显然,它工作得更好:-)

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

user_input = ' '.join(user_input.split())

AND = oneOf(['AND', '&'])
OR = oneOf(['OR', '|'])
FIELD = Word(alphanums)
OPERATOR = oneOf(OPERATORS)
VALUE = Word(unicode_printables)
COMPARISON = FIELD + OPERATOR + VALUE

QUERY = infixNotation(
    COMPARISON,
    [
        (AND, 2, opAssoc.LEFT,),
        (OR, 2, opAssoc.LEFT,),
    ]
)

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

COMPARISON.addParseAction(ComparisonExpr)

RESULT = QUERY.parseString(user_input).asList()
print type(RESULT)
from pprint import pprint
pprint(RESULT)

输出为:

[
  [
    <[snip]ComparisonExpr instance at 0x043D0918>,
    'AND',
    <[snip]ComparisonExpr instance at 0x043D0F08>,
    'AND',
    [
      [
        <[snip]ComparisonExpr instance at 0x043D3878>,
        'AND',
        <[snip]ComparisonExpr instance at 0x043D3170>
      ],
      'OR',
      [
        [
          <[snip]ComparisonExpr instance at 0x043D3030>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D3620>
        ],
        'AND',
        [
          <[snip]ComparisonExpr instance at 0x043D3210>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D34E0>
        ]
      ]
    ]
  ]
]

有没有办法用字典而不是ComparisonExpr实例返回RESULT?

EDIT2

想出一个幼稚且非常具体的解决方案,但到目前为止对我仍然有效:

[snip]
class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

    def asDict(self):
        return {
            "field": self.tokens.asList()[0],
            "operator": self.tokens.asList()[1],
            "value": self.tokens.asList()[2]
        }

[snip]
RESULT = QUERY.parseString(user_input).asList()[0]
def convert(list):
    final = []
    for item in list:
        if item.__class__.__name__ == 'ComparisonExpr':
            final.append(item.asDict())
        elif item in ['AND', 'OR']:
            final.append(item)
        elif item.__class__.__name__ == 'list':
            final.append(convert(item))
        else:
            print 'ooops forgotten something maybe?'

    return final

FINAL = convert(RESULT)
pprint(FINAL)

哪个输出:

[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},
   'AND',
   {'field': 'vote', 'operator': '=', 'value': 'true'},
   'AND',
   [[{'field': 'age', 'operator': '>=', 'value': '25'},
     'AND',
     {'field': 'gender', 'operator': '=', 'value': 'M'}],
    'OR',
    [[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},
      'AND',
      {'field': 'gender', 'operator': '=', 'value': 'F'}],
     'AND',
     [{'field': 'age', 'operator': '>=', 'value': '70'},
      'AND',
      {'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]

再次感谢保罗为我指出了正确的方向!

剩下的唯一未知的事情是我将'true'变成True,将'[20,30]'变成[20, 30].

解决方案

nestedExpr是pyparsing中的便捷表达式,可轻松定义带有匹配的开头和结尾字符的文本.当您想解析嵌套的内容时,nestedExpr通常结构不够好.

使用pyparsing的infixNotation方法可以更好地解决您要解析的查询语法.您可以在pyparsing Wiki的示例"页面上看到几个示例-SimpleBool与您正在解析的非常相似.

前缀表示法"是表达式的一般解析术语,其中运算符位于其相关操作数之间(相对于后缀表示法",运算符跟随操作数,如"2 3 +"而不是"2 + 3" "或前缀表示法",看起来像"+ 2 3").运算符在评估中可以具有优先顺序,该优先顺序可以覆盖从左到右的顺序-例如,在"2 + 3 * 4"中,运算的优先顺序指示乘法在加法之前得到评估.前缀表示法还支持使用括号或其他分组字符来覆盖该优先级,例如(2 + 3)* 4"中强制执行加法运算.

pyparsing的infixNotation方法采用基本操作数表达式,然后按优先级顺序使用运算符定义元组列表.例如,四功能整数算术运算将如下所示:

parser = infixNotation(integer,
             [
             (oneOf('* /'), 2, opAssoc.LEFT),
             (oneOf('+ -'), 2, opAssoc.LEFT),
             ])

这意味着我们将按顺序解析整数操作数,包括"*"和"/"二进制左关联操作以及"+"和-"二进制操作. infixNotation中内置了对括号的支持,以覆盖顺序.

查询字符串通常是布尔运算NOT,AND和OR的某种组合,通常按该优先级顺序进行评估.在您的情况下,这些运算符的操作数是比较表达式,例如"address = street"或"[20,30]之间的年龄".因此,如果您为比较表达式定义一个表达式,格式为fieldname operator value,则可以使用infixNotation对AND和OR进行正确的分组:

import pyparsing as pp
query_expr = pp.infixNotation(comparison_expr,
                [
                    (NOT, 1, pp.opAssoc.RIGHT,),
                    (AND, 2, pp.opAssoc.LEFT,),
                    (OR, 2, pp.opAssoc.LEFT,),
                ])

最后,我建议您定义一个类,以将比较标记作为类的初始args,然后可以将行为附加到该类以评估比较并输出调试字符串,例如:

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                            *self.tokens.asList())

# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)

然后您将获得如下输出:

query_expr.parseString(sample).pprint()

[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
  'AND',
  Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
  'AND',
  [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
    'AND',
    Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

SimpleBool.py示例具有更多详细信息,向您展示如何创建此类以及有关NOT,AND和OR运算符的相关类.

是否有一种方法可以返回带有字典的结果,而不是带有ComparisonExpr实例的结果?" 正在调用ComparisonExpr类上的__repr__方法而不是__str__.最简单的解决方案是添加到您的课程中:

__repr__ = __str__

或者只是将__str__重命名为__repr__.

剩下的唯一未知的事情就是我将'true'转换为True,将'[20,30]'转换为[20,30]"

尝试:

CK = CaselessKeyword  # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

然后将它们添加到您的VALUE表达式中:

VALUE = bool_literal | num_list | Word(unicode_printables)

最后:

from pprint import pprint
pprint(RESULT)

我已经 so 厌倦了一直导入pprint来执行此操作,我只是将其添加到ParseResults的API中.试试:

RESULT.pprint()  # no import required on your part

print(RESULT.dump()) # will also show indented list of named fields

EDIT2

最后,结果名称很容易学习.如果您将其更改为比较",则一切将照常进行:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

但是现在您可以写:

def asDict(self):
    return self.tokens.asDict()

并且您可以按名称而不是索引位置(使用result['field']表示法或result.field表示法)访问解析的值.

I am working on a very simple "querying syntax" usable by people with reasonable technical skills (i.e., not coders per se, but able to touch on the subject)

A typical example of what they would enter on a form is:

address like street
AND
vote =  True
AND
(
  (
    age>=25
    AND
    gender = M
  )
  OR
  (
    age between [20,30]
    AND
    gender = F
  )
  OR
  (
    age >= 70
    AND
    eyes != blue
  )
)

With

  1. no quote required
  2. potentially infinite nesting of parentheses
  3. simple AND|OR linking

I am using pyparsing (well, trying to anyway) and reaching something:

from pyparsing import *

OPERATORS = [
    '<',
    '<=',
    '>',
    '>=',
    '=',
    '!=',
    'like'
    'regexp',
    'between'
]

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

# user_input is the text sent by the client form
user_input = ' '.join(user_input.split())
user_input = '(' + user_input + ')'

AND = Keyword("AND").setName('AND')
OR = Keyword("OR").setName('OR')

FIELD = Word(alphanums).setName('FIELD')
OPERATOR = oneOf(OPERATORS).setName('OPERATOR')
VALUE = Word(unicode_printables).setName('VALUE')
CRITERION = FIELD + OPERATOR + VALUE

QUERY = Forward()
NESTED_PARENTHESES = nestedExpr('(', ')')
QUERY << ( CRITERION | AND | OR | NESTED_PARENTHESES )

RESULT = QUERY.parseString(user_input)
RESULT.pprint()

The output is:

[['address',
  'like',
  'street',
  'AND',
  'vote',
  '=',
  'True',
  'AND',
  [['age>=25', 'AND', 'gender', '=', 'M'],
   'OR',
   ['age', 'between', '[20,30]', 'AND', 'gender', '=', 'F'],
   'OR',
   ['age', '>=', '70', 'AND', 'eyes', '!=', 'blue']]]]

Which I am only partially happy with - the main reason being that the desired final output would look like this:

[
  {
    "field" : "address",
    "operator" : "like",
    "value" : "street",
  },
  'AND',
  {
    "field" : "vote",
    "operator" : "=",
    "value" : True,
  },
  'AND',
  [
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 25,
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "M",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : "between",
        "value" : [20,30],
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "F",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 70,
      },
      'AND'
      {
        "field" : "eyes",
        "operator" : "!=",
        "value" : "blue",
      }
    ],
  ]
]

Many thanks!

EDIT

After Paul's answer, this is what the code looks like. Obviously it works much more nicely :-)

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

user_input = ' '.join(user_input.split())

AND = oneOf(['AND', '&'])
OR = oneOf(['OR', '|'])
FIELD = Word(alphanums)
OPERATOR = oneOf(OPERATORS)
VALUE = Word(unicode_printables)
COMPARISON = FIELD + OPERATOR + VALUE

QUERY = infixNotation(
    COMPARISON,
    [
        (AND, 2, opAssoc.LEFT,),
        (OR, 2, opAssoc.LEFT,),
    ]
)

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

COMPARISON.addParseAction(ComparisonExpr)

RESULT = QUERY.parseString(user_input).asList()
print type(RESULT)
from pprint import pprint
pprint(RESULT)

The output is:

[
  [
    <[snip]ComparisonExpr instance at 0x043D0918>,
    'AND',
    <[snip]ComparisonExpr instance at 0x043D0F08>,
    'AND',
    [
      [
        <[snip]ComparisonExpr instance at 0x043D3878>,
        'AND',
        <[snip]ComparisonExpr instance at 0x043D3170>
      ],
      'OR',
      [
        [
          <[snip]ComparisonExpr instance at 0x043D3030>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D3620>
        ],
        'AND',
        [
          <[snip]ComparisonExpr instance at 0x043D3210>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D34E0>
        ]
      ]
    ]
  ]
]

Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?

EDIT2

Came up with a naive and very specific solution, but which works for me so far:

[snip]
class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

    def asDict(self):
        return {
            "field": self.tokens.asList()[0],
            "operator": self.tokens.asList()[1],
            "value": self.tokens.asList()[2]
        }

[snip]
RESULT = QUERY.parseString(user_input).asList()[0]
def convert(list):
    final = []
    for item in list:
        if item.__class__.__name__ == 'ComparisonExpr':
            final.append(item.asDict())
        elif item in ['AND', 'OR']:
            final.append(item)
        elif item.__class__.__name__ == 'list':
            final.append(convert(item))
        else:
            print 'ooops forgotten something maybe?'

    return final

FINAL = convert(RESULT)
pprint(FINAL)

Which outputs:

[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},
   'AND',
   {'field': 'vote', 'operator': '=', 'value': 'true'},
   'AND',
   [[{'field': 'age', 'operator': '>=', 'value': '25'},
     'AND',
     {'field': 'gender', 'operator': '=', 'value': 'M'}],
    'OR',
    [[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},
      'AND',
      {'field': 'gender', 'operator': '=', 'value': 'F'}],
     'AND',
     [{'field': 'age', 'operator': '>=', 'value': '70'},
      'AND',
      {'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]

Again thanks to Paul for pointing me if a right direction!

The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30].

解决方案

nestedExpr is a convenience expression in pyparsing, to make it easy to define text with matched opening and closing characters. When you want to parse the nested contents, then nestedExpr is usually not well structured enough.

The query syntax you are trying to parse is better served using pyparsing's infixNotation method. You can see several examples at the pyparsing wiki's Examples page - SimpleBool is is very similar to what you are parsing.

"Infix notation" is a general parsing term for expressions where the operator is in between its related operands (vs. "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3"; or "prefix notation" which looks like "+ 2 3"). Operators can have an order of precedence in evaluation that can override left-to-right order - for instance, in "2 + 3 * 4", precedence of operations dictates that multiplication gets evaluated before addition. Infix notation also supports using parentheses or other grouping characters to override that precedence, as in "(2 + 3) * 4" to force the addition operation to be done first.

pyparsing's infixNotation method takes a base operand expression, and then a list of operator definition tuples, in order of precedence. For instance, 4-function integer arithmetic would look like:

parser = infixNotation(integer,
             [
             (oneOf('* /'), 2, opAssoc.LEFT),
             (oneOf('+ -'), 2, opAssoc.LEFT),
             ])

Meaning that we will parse integer operands, with '*' and '/' binary left-associative operations and '+' and '-' binary operations, in that order. Support for parentheses to override the order is built into infixNotation.

Query strings are often some combination of boolean operations NOT, AND, and OR, and typically evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions, like "address = street" or "age between [20,30]". So if you define an expression for a comparison expression, of the form fieldname operator value, then you can use infixNotation to do the right grouping of AND's and OR's:

import pyparsing as pp
query_expr = pp.infixNotation(comparison_expr,
                [
                    (NOT, 1, pp.opAssoc.RIGHT,),
                    (AND, 2, pp.opAssoc.LEFT,),
                    (OR, 2, pp.opAssoc.LEFT,),
                ])

Finally, I suggest you define a class to take the comparison tokens as class init args, then you can attach behavior to that class to evaluate the comparisons and output debug strings, something like:

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                            *self.tokens.asList())

# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)

Then you can get output like:

query_expr.parseString(sample).pprint()

[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
  'AND',
  Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
  'AND',
  [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
    'AND',
    Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

The SimpleBool.py example has more details to show you how to create this class, and related classes for NOT, AND, and OR operators.

EDIT:

"Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?" The __repr__ method on your ComparisonExpr class is being called instead of __str__. Easiest solution is to add to your class:

__repr__ = __str__

Or just rename __str__ to __repr__.

"The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30]"

Try:

CK = CaselessKeyword  # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

Then add these to your VALUE expression:

VALUE = bool_literal | num_list | Word(unicode_printables)

Lastly:

from pprint import pprint
pprint(RESULT)

I got so tired of importing pprint all the time to do just this, I just added it to the API for ParseResults. Try:

RESULT.pprint()  # no import required on your part

or

print(RESULT.dump()) # will also show indented list of named fields

EDIT2

LASTLY, results names are good to learn. If you make this change to COMPARISON, everything still works as you have it:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

But now you can write:

def asDict(self):
    return self.tokens.asDict()

And you can access the parsed values by name instead of index position (either using result['field'] notation or result.field notation).

这篇关于pyparsing nestedExpr和嵌套括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆