PyParsing Parse 嵌套循环,带大括号和特定标头 [英] PyParsing Parse nested loop with brace and specific header

查看:91
本文介绍了PyParsing Parse 嵌套循环,带大括号和特定标头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了几个关于 pyparsing 的主题.他们在解析嵌套循环时处理几乎相同的问题,但即便如此,我也找不到解决我的错误的方法.

I found several topics about pyparsing. They are dealing with almost the same problem in parsing nested loop, but even with that, i can't find a solution to my errors.

我有以下格式:

key value;
header_name "optional_metadata"
{
     key value;
     sub_header_name
     {
        key value;
     };
};
key value;

  • 键是字母
  • 值可以是 Int、String 类型,带字母 + "@._"
  • 键/值可能在大括号块之后
  • 键/值可能在文件中的第一个大括号块之前
  • 大括号块之前或之后的键/值是可选的
  • 标题可能有名字
  • 右大括号后跟一个分号
  • 我使用了以下解析器:

    VALID_KEY_CHARACTERS = alphanums
    VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")
    
    lbr = Literal( '{' ).suppress()
    rbr = Literal( '}' ).suppress() + Literal(";").suppress()
    
    expr = Forward()
    atom = Word(VALID_KEY_CHARACTERS) + Optional(Word(VALID_VALUE_CHARACTERS))
    pair = atom | lbr + OneOrMore( expr ) + rbr
    expr << Group( atom + pair )
    

    当我使用它时,我只得到了header_name"和header_metadata",我修改了它,我只得到了大括号内的键/值,python异常被触发以显示解析错误(它需要'}'到达 sub_header_name 时.

    When i use it, i got only the "header_name" and "header_metadata", i modified it, and i got only key/value inside a brace, python exception is triggered to show a parsing error (it expects '}' when reaching the sub_header_name.

    谁能帮我理解为什么?谢谢.

    Anyone can help me to understand why ? Thank you.

    推荐答案

    我认为主要的问题是你的语法没有完全描述输入,导致几个不匹配.我看到的两个主要问题是您忘记了每个密钥对值必须以分号结尾,并且没有指定密钥对值可以出现在右花括号之后.它看起来也像以下几行:

    I think that the main problem is that your grammar does not fully describe the input, leading to several mismatches. The two main problems I saw was that you forgot that each of your key-pair values must end in a semicolon and did not specify that a key-pair value can come after a closing curly brace. It also looks like the lines:

    pair = atom | lbr + OneOrMore( expr ) + rbr
    expr << Group( atom + pair )
    

    ...将要求每组大括号至少包含两个密钥对值或一个密钥对值和一组大括号.我相信一旦遇到这些行,这会导致错误:

    ...would require each set of curly braces to contain, at minimum, two key-pair values or a key-pair value and a set of curly braces. I believe this would cause an error once you encounter the lines:

    {
        key value;
    };
    

    ...在您的输入范围内,尽管我并不完全确定.

    ...within your input, though I'm not entirely certain.

    无论如何,在玩弄你的语法之后,我最终得到了这个:

    In any case, after playing around with your grammar, I ended up with this:

    from pyparsing import *
    
    data = """key1 value1; 
    header_name "optional_metadata"
    {
         key2 value2;
         sub_header_name
         {
            key value;
         };
    };
    key3 value3;"""
    
    # I'm reusing the key characters for the header names, which can contain a semicolon
    VALID_KEY_CHARACTERS = srange("[a-zA-Z0-9_]")
    VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")
    
    semicolon = Literal(';').suppress()
    lbr = Literal('{').suppress()
    rbr = Literal('}').suppress()
    
    key = Word(VALID_KEY_CHARACTERS)
    value = Word(VALID_VALUE_CHARACTERS)
    
    key_pair = Group(key + value + semicolon)("key_pair")
    metadata = Group(key + Optional(value))("metadata")
    
    header = key_pair + Optional(metadata)
    
    expr = Forward()
    contents = Group(lbr + expr + rbr + semicolon)("contents")
    expr << header + Optional(contents) + Optional(key_pair)
    
    print expr.parseString(data).asXML()
    

    结果如下:

    <key_pair>
      <key_pair>
        <ITEM>key1</ITEM>
        <ITEM>value1</ITEM>
      </key_pair>
      <metadata>
        <ITEM>header_name</ITEM>
        <ITEM>&quot;optional_metadata&quot;</ITEM>
      </metadata>
      <contents>
        <key_pair>
          <ITEM>key2</ITEM>
          <ITEM>value2</ITEM>
        </key_pair>
        <metadata>
          <ITEM>sub_header_name</ITEM>
        </metadata>
        <contents>
          <key_pair>
            <ITEM>key</ITEM>
            <ITEM>value</ITEM>
          </key_pair>
        </contents>
      </contents>
      <key_pair>
        <ITEM>key3</ITEM>
        <ITEM>value3</ITEM>
      </key_pair>
    </key_pair>
    

    我不完全确定这是否正是您想要完成的,希望它应该足够接近,以便您可以调整它以适合您的特定任务.

    I'm not entirely sure if this is exactly what you were trying to accomplish, hopefully it should be close enough that you can tweak it to suit your particular task.

    这篇关于PyParsing Parse 嵌套循环,带大括号和特定标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆