pyparsing nestedExpr 和嵌套括号 [英] pyparsing nestedExpr and nested parentheses

查看:25
本文介绍了pyparsing nestedExpr 和嵌套括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一种非常简单的查询语法",可供具有合理技术技能的人使用(即,本身不是编码人员,但能够触及主题)

他们在表单上输入的典型示例是:

地址如街道和投票 = 真和((年龄>=25和性别 = 男)或者(年龄介于 [20,30]和性别 = F)或者(年龄 >= 70和眼睛!=蓝色))

  1. 无需报价
  2. 括号的潜在无限嵌套
  3. 简单的 AND|OR 链接

我正在使用 pyparsing(好吧,无论如何都在尝试)并达到一些目的:

from pyparsing import *运营商 = ['<','<=','>','>=','=','!=','喜欢''正则表达式','之间']unicode_printables = u''.join(unichr(c) for c in xrange(65536)如果不是 unichr(c).isspace())# user_input 是客户端表单发送的文本user_input = ' '.join(user_input.split())用户输入 = '(' + 用户输入 + ')'AND = Keyword("AND").setName('AND')OR = Keyword("OR").setName('OR')FIELD = Word(alphanums).setName('FIELD')OPERATOR = oneOf(OPERATORS).setName('OPERATOR')VALUE = Word(unicode_printables).setName('VALUE')标准 = 领域 + 操作者 + 价值查询 = 转发()NESTED_PARENTHESES =nestedExpr('(', ')')查询<<(标准 | 和 | 或 | NESTED_PARENTHESES)结果 = QUERY.parseString(user_input)结果.pprint()

输出为:

[['地址','喜欢','街道','和','投票','=','真的','和',[['age>=25', 'AND', 'gender', '=', 'M'],'或者',['年龄', '之间', '[20,30]', 'AND', '性别', '=', 'F'],'或者',['年龄', '>=', '70', 'AND', '眼睛', '!=', '蓝色']]]]]

我只是部分满意 - 主要原因是所需的最终输出看起来像这样:

<预><代码>[{字段":地址",运营商":喜欢",价值":街道",},'和',{领域":投票",运算符":=",价值":真,},'和',[[{领域":年龄",运算符":>=",价值":25,},'和'{领域":性别",运算符":=",价值":M",}],'或者',[{领域":年龄","operator" : "之间",价值":[20,30],},'和'{领域":性别",运算符":=",值":F",}],'或者',[{领域":年龄",运算符":>=",价值":70,},'和'{领域":眼睛",运算符":!=",价值":蓝色",}],]]

非常感谢!

编辑

在 Paul 的回答之后,这就是代码的样子.显然它工作得更好:-)

unicode_printables = u''.join(unichr(c) for c in xrange(65536)如果不是 unichr(c).isspace())user_input = ' '.join(user_input.split())AND = oneOf(['AND', '&'])OR = oneOf(['OR', '|'])FIELD = Word(字母数字)运营商 = oneOf(运营商)值 = 字(unicode_printables)比较 = 字段 + 运算符 + 值查询 = 中缀符号(比较,[(AND, 2, opAssoc.LEFT,),(OR, 2, opAssoc.LEFT,),])类比较表达式:def __init__(self, tokens):self.tokens = 令牌def __str__(self):return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())COMPARISON.addParseAction(ComparisonExpr)结果 = QUERY.parseString(user_input).asList()打印类型(结果)从 pprint 导入 pprint打印(结果)

输出为:

<预><代码>[[<[snip]ComparisonExpr 实例在 0x043D0918>,'和',<[snip]ComparisonExpr 实例在 0x043D0F08>,'和',[[<[snip]ComparisonExpr 实例在 0x043D3878>,'和',<[snip]ComparisonExpr 实例在 0x043D3170>],'或者',[[<[snip]ComparisonExpr 实例在 0x043D3030>,'和',<[snip]ComparisonExpr 实例在 0x043D3620>],'和',[<[snip]ComparisonExpr 实例在 0x043D3210>,'和',<[snip]ComparisonExpr 实例在 0x043D34E0>]]]]]

有没有办法用字典而不是 ComparisonExpr 实例返回结果?

编辑2

想出了一个天真且非常具体的解决方案,但到目前为止对我有用:

[snip]类比较表达式:def __init__(self, tokens):self.tokens = 令牌def __str__(self):return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())def asDict(self):返回 {字段":self.tokens.asList()[0],运算符":self.tokens.asList()[1],值":self.tokens.asList()[2]}[剪辑]结果 = QUERY.parseString(user_input).asList()[0]定义转换(列表):最终 = []对于列表中的项目:如果 item.__class__.__name__ == 'ComparisonExpr':final.append(item.asDict())['AND', 'OR'] 中的 elif 项:final.append(item)elif item.__class__.__name__ == 'list':final.append(convert(item))别的:打印'哎呀忘记了一些东西吗?返回决赛最终 = 转换(结果)打印(最终)

输出:

[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},'和',{'field': 'vote', 'operator': '=', 'value': 'true'},'和',[[{'field': 'age', 'operator': '>=', 'value': '25'},'和',{'field': 'gender', 'operator': '=', 'value': 'M'}],'或者',[[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},'和',{'field': 'gender', 'operator': '=', 'value': 'F'}],'和',[{'field': 'age', 'operator': '>=', 'value': '70'},'和',{'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]

再次感谢保罗为我指出正确的方向!

我唯一不知道的就是把'true'变成True,把'[20,30]'变成[20, 30].

解决方案

nestedExpr 是 pyparsing 中的一个便利表达式,可以轻松定义具有匹配开始和结束字符的文本.当你想解析嵌套的内容时,nestedExpr通常结构不够好.

使用 pyparsing 的 infixNotation 方法可以更好地为您尝试解析的查询语法提供服务.您可以在 pyparsing wiki 的示例页面上看到几个示例 - SimpleBool 与您正在解析的内容非常相似.

中缀表示法"是运算符位于其相关操作数之间的表达式的通用解析术语(相对于运算符在操作数之后的后缀表示法",如2 3 +"而不是2 + 3""; 或前缀符号",看起来像+ 2 3").运算符在计算中可以有一个优先顺序,可以覆盖从左到右的顺序 - 例如,在2 + 3 * 4"中,运算的优先级指示乘法在加法之前被计算.中缀表示法还支持使用括号或其他分组字符来覆盖该优先级,如在(2 + 3) * 4"中强制先进行加法运算.

pyparsing 的 infixNotation 方法接受一个基本操作数表达式,然后是一个运算符定义元组列表,按优先顺序排列.例如,4 函数整数运算看起来像:

parser = infixNotation(integer,[(oneOf('*/'), 2, opAssoc.LEFT),(oneOf('+ -'), 2, opAssoc.LEFT),])

这意味着我们将按照*"和/"二元左关联运算以及+"和-"二元运算的顺序解析整数操作数.infixNotation 中内置了对括号覆盖顺序的支持.

查询字符串通常是布尔运算 NOT、AND 和 OR 的某种组合,并且通常按该优先顺序进行评估.在您的情况下,这些运算符的操作数是比较表达式,例如address = street"或age between [20,30]".因此,如果您为比较表达式定义一个表达式,格式为 fieldname operator value,那么您可以使用 infixNotation 对 AND 和 OR 进行正确分组:

导入pyparsing为ppquery_expr = pp.infixNotation(comparison_expr,[(不是,1,pp.opAssoc.RIGHT,),(AND, 2, pp.opAssoc.LEFT,),(或,2,pp.opAssoc.LEFT,),])

最后,我建议您定义一个类来将比较标记作为类 init args,然后您可以将行为附加到该类以评估比较并输出调试字符串,例如:

classComparisonExpr:def __init__(self, tokens):self.tokens = 令牌def __str__(self):return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())# 将类附加到比较表达式compare_expr.addParseAction(ComparisonExpr)

然后你可以得到如下输出:

query_expr.parseString(sample).pprint()[[比较:({'field': 'address', 'operator': 'like', 'value': 'street'}),'和',比较:({'field': 'vote', 'operator': '=', 'value': True}),'和',[[比较:({'field': 'age', 'operator': '>=', 'value': 25}),'和',比较:({'field': 'gender', 'operator': '=', 'value': 'M'})],'或者',[比较:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),'和',比较:({'field': 'gender', 'operator': '=', 'value': 'F'})],'或者',[比较:({'field': 'age', 'operator': '>=', 'value': 70}),'和',比较:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

SimpleBool.py 示例有更多详细信息向您展示如何创建此类以及 NOT、AND 和 OR 运算符的相关类.

有没有办法用字典而不是ComparisonExpr 实例返回RESULT?"正在调用 ComparisonExpr 类上的 __repr__ 方法而不是 __str__.最简单的解决方案是添加到您的类中:

__repr__ = __str__

或者只是将 __str__ 重命名为 __repr__.

我唯一不知道的就是把'true'变成True,把'[20,30]'变成[20, 30]"

试试:

CK = CaselessKeyword # 因为我很懒bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')LBRACK,RBRACK = 地图(抑制,[]")# 使用 pyparsing_common.number 解析数字,其中包括 str->int 转换解析动作num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

然后将这些添加到您的 VALUE 表达式中:

VALUE = bool_literal |num_list |字(unicode_printables)

最后:

from pprint import pprint打印(结果)

我已经所以厌倦了一直导入pprint来做这件事,我只是将它添加到ParseResults的API中.试试:

RESULT.pprint() # 您不需要导入

print(RESULT.dump()) # 还将显示命名字段的缩进列表

EDIT2

最后,结果名称很好学.如果您对 COMPARISON 进行此更改,则一切仍然正常:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

但现在你可以写:

def asDict(self):返回 self.tokens.asDict()

并且您可以通过名称而不是索引位置访问解析值(使用 result['field'] 表示法或 result.field 表示法).

I am working on a very simple "querying syntax" usable by people with reasonable technical skills (i.e., not coders per se, but able to touch on the subject)

A typical example of what they would enter on a form is:

address like street
AND
vote =  True
AND
(
  (
    age>=25
    AND
    gender = M
  )
  OR
  (
    age between [20,30]
    AND
    gender = F
  )
  OR
  (
    age >= 70
    AND
    eyes != blue
  )
)

With

  1. no quote required
  2. potentially infinite nesting of parentheses
  3. simple AND|OR linking

I am using pyparsing (well, trying to anyway) and reaching something:

from pyparsing import *

OPERATORS = [
    '<',
    '<=',
    '>',
    '>=',
    '=',
    '!=',
    'like'
    'regexp',
    'between'
]

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

# user_input is the text sent by the client form
user_input = ' '.join(user_input.split())
user_input = '(' + user_input + ')'

AND = Keyword("AND").setName('AND')
OR = Keyword("OR").setName('OR')

FIELD = Word(alphanums).setName('FIELD')
OPERATOR = oneOf(OPERATORS).setName('OPERATOR')
VALUE = Word(unicode_printables).setName('VALUE')
CRITERION = FIELD + OPERATOR + VALUE

QUERY = Forward()
NESTED_PARENTHESES = nestedExpr('(', ')')
QUERY << ( CRITERION | AND | OR | NESTED_PARENTHESES )

RESULT = QUERY.parseString(user_input)
RESULT.pprint()

The output is:

[['address',
  'like',
  'street',
  'AND',
  'vote',
  '=',
  'True',
  'AND',
  [['age>=25', 'AND', 'gender', '=', 'M'],
   'OR',
   ['age', 'between', '[20,30]', 'AND', 'gender', '=', 'F'],
   'OR',
   ['age', '>=', '70', 'AND', 'eyes', '!=', 'blue']]]]

Which I am only partially happy with - the main reason being that the desired final output would look like this:

[
  {
    "field" : "address",
    "operator" : "like",
    "value" : "street",
  },
  'AND',
  {
    "field" : "vote",
    "operator" : "=",
    "value" : True,
  },
  'AND',
  [
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 25,
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "M",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : "between",
        "value" : [20,30],
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "F",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 70,
      },
      'AND'
      {
        "field" : "eyes",
        "operator" : "!=",
        "value" : "blue",
      }
    ],
  ]
]

Many thanks!

EDIT

After Paul's answer, this is what the code looks like. Obviously it works much more nicely :-)

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

user_input = ' '.join(user_input.split())

AND = oneOf(['AND', '&'])
OR = oneOf(['OR', '|'])
FIELD = Word(alphanums)
OPERATOR = oneOf(OPERATORS)
VALUE = Word(unicode_printables)
COMPARISON = FIELD + OPERATOR + VALUE

QUERY = infixNotation(
    COMPARISON,
    [
        (AND, 2, opAssoc.LEFT,),
        (OR, 2, opAssoc.LEFT,),
    ]
)

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

COMPARISON.addParseAction(ComparisonExpr)

RESULT = QUERY.parseString(user_input).asList()
print type(RESULT)
from pprint import pprint
pprint(RESULT)

The output is:

[
  [
    <[snip]ComparisonExpr instance at 0x043D0918>,
    'AND',
    <[snip]ComparisonExpr instance at 0x043D0F08>,
    'AND',
    [
      [
        <[snip]ComparisonExpr instance at 0x043D3878>,
        'AND',
        <[snip]ComparisonExpr instance at 0x043D3170>
      ],
      'OR',
      [
        [
          <[snip]ComparisonExpr instance at 0x043D3030>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D3620>
        ],
        'AND',
        [
          <[snip]ComparisonExpr instance at 0x043D3210>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D34E0>
        ]
      ]
    ]
  ]
]

Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?

EDIT2

Came up with a naive and very specific solution, but which works for me so far:

[snip]
class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

    def asDict(self):
        return {
            "field": self.tokens.asList()[0],
            "operator": self.tokens.asList()[1],
            "value": self.tokens.asList()[2]
        }

[snip]
RESULT = QUERY.parseString(user_input).asList()[0]
def convert(list):
    final = []
    for item in list:
        if item.__class__.__name__ == 'ComparisonExpr':
            final.append(item.asDict())
        elif item in ['AND', 'OR']:
            final.append(item)
        elif item.__class__.__name__ == 'list':
            final.append(convert(item))
        else:
            print 'ooops forgotten something maybe?'

    return final

FINAL = convert(RESULT)
pprint(FINAL)

Which outputs:

[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},
   'AND',
   {'field': 'vote', 'operator': '=', 'value': 'true'},
   'AND',
   [[{'field': 'age', 'operator': '>=', 'value': '25'},
     'AND',
     {'field': 'gender', 'operator': '=', 'value': 'M'}],
    'OR',
    [[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},
      'AND',
      {'field': 'gender', 'operator': '=', 'value': 'F'}],
     'AND',
     [{'field': 'age', 'operator': '>=', 'value': '70'},
      'AND',
      {'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]

Again thanks to Paul for pointing me if a right direction!

The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30].

解决方案

nestedExpr is a convenience expression in pyparsing, to make it easy to define text with matched opening and closing characters. When you want to parse the nested contents, then nestedExpr is usually not well structured enough.

The query syntax you are trying to parse is better served using pyparsing's infixNotation method. You can see several examples at the pyparsing wiki's Examples page - SimpleBool is is very similar to what you are parsing.

"Infix notation" is a general parsing term for expressions where the operator is in between its related operands (vs. "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3"; or "prefix notation" which looks like "+ 2 3"). Operators can have an order of precedence in evaluation that can override left-to-right order - for instance, in "2 + 3 * 4", precedence of operations dictates that multiplication gets evaluated before addition. Infix notation also supports using parentheses or other grouping characters to override that precedence, as in "(2 + 3) * 4" to force the addition operation to be done first.

pyparsing's infixNotation method takes a base operand expression, and then a list of operator definition tuples, in order of precedence. For instance, 4-function integer arithmetic would look like:

parser = infixNotation(integer,
             [
             (oneOf('* /'), 2, opAssoc.LEFT),
             (oneOf('+ -'), 2, opAssoc.LEFT),
             ])

Meaning that we will parse integer operands, with '*' and '/' binary left-associative operations and '+' and '-' binary operations, in that order. Support for parentheses to override the order is built into infixNotation.

Query strings are often some combination of boolean operations NOT, AND, and OR, and typically evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions, like "address = street" or "age between [20,30]". So if you define an expression for a comparison expression, of the form fieldname operator value, then you can use infixNotation to do the right grouping of AND's and OR's:

import pyparsing as pp
query_expr = pp.infixNotation(comparison_expr,
                [
                    (NOT, 1, pp.opAssoc.RIGHT,),
                    (AND, 2, pp.opAssoc.LEFT,),
                    (OR, 2, pp.opAssoc.LEFT,),
                ])

Finally, I suggest you define a class to take the comparison tokens as class init args, then you can attach behavior to that class to evaluate the comparisons and output debug strings, something like:

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                            *self.tokens.asList())

# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)

Then you can get output like:

query_expr.parseString(sample).pprint()

[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
  'AND',
  Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
  'AND',
  [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
    'AND',
    Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

The SimpleBool.py example has more details to show you how to create this class, and related classes for NOT, AND, and OR operators.

EDIT:

"Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?" The __repr__ method on your ComparisonExpr class is being called instead of __str__. Easiest solution is to add to your class:

__repr__ = __str__

Or just rename __str__ to __repr__.

"The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30]"

Try:

CK = CaselessKeyword  # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

Then add these to your VALUE expression:

VALUE = bool_literal | num_list | Word(unicode_printables)

Lastly:

from pprint import pprint
pprint(RESULT)

I got so tired of importing pprint all the time to do just this, I just added it to the API for ParseResults. Try:

RESULT.pprint()  # no import required on your part

or

print(RESULT.dump()) # will also show indented list of named fields

EDIT2

LASTLY, results names are good to learn. If you make this change to COMPARISON, everything still works as you have it:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

But now you can write:

def asDict(self):
    return self.tokens.asDict()

And you can access the parsed values by name instead of index position (either using result['field'] notation or result.field notation).

这篇关于pyparsing nestedExpr 和嵌套括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆