用柯利布雷特斯解析文件 [英] parsing file with curley brakets

查看:83
本文介绍了用柯利布雷特斯解析文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析一个文件,文件中的信息用大括号括起来,例如:

Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}

这是我在Python中尝试过的内容

from io import open
from pyparsing import *
import pprint

def parse(s):
    return nestedExpr('{','}').parseString(s).asList()

def test(strng):
    print strng
    try:
        cfgFile = file(strng)
        cfgData = "".join( cfgFile.readlines() )
        list = parse( cfgData )
        pp = pprint.PrettyPrinter(2)
        pp.pprint(list)

    except ParseException, err:
        print err.line
        print " "*(err.column-1) + "^"
        print err

    cfgFile.close()
    print
    return list

if __name__ == '__main__':
    test('testfile')

但这失败并显示错误:

testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)

Traceback (most recent call last):
  File "xxx.py", line 55, in <module>
    test('testfile')
  File "xxx.py", line 40, in test
    return list
UnboundLocalError: local variable 'list' referenced before assignment  

我需要做些什么才能使它起作用? 比pyparsing更好的解析器吗?

解决方案

递归是这里的关键.尝试解决该问题:

def parse(it):
    result = []
    while True:
        try:
            tk = next(it)
        except StopIteration:
            break

        if tk == '}':
            break
        val = next(it)
        if val == '{':
            result.append((tk,parse(it)))
        else:
            result.append((tk, val))

    return result

用例:

import pprint       

data = """
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}
"""

r = parse(iter(data.split()))
pprint.pprint(r)

...产生(Python 2.6):

[('Continent',
  [('Name', 'Europe'),
   ('Country',
    [('Name', 'UK'),
     ('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
     ('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]

请仅以此为起点,并随时根据需要改进代码(取决于数据,字典可能是更好的选择).另外,该示例代码无法处理格式错误的数据(尤其是多余或丢失的}-我敦促您进行完整的测试;


编辑:发现pyparsing时,我尝试了以下方法,它们看起来(效果)更好(可以),并且可以(更多)轻松地满足特殊需求:

import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas

def syntax():
    lbr = Literal( '{' ).suppress()
    rbr = Literal( '}' ).suppress()
    key = Word( alphas )
    atom = Word ( alphas )
    expr = Forward()
    pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
    expr << Group ( key + pair )

    return expr

expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)

制作:

[['Continent',
  ['Name', 'Europe'],
  ['Country',
   ['Name', 'UK'],
   ['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
   ['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]

I need to parse a file with information separated by curly brackets, for example:

Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}

Here is what I have tried in Python

from io import open
from pyparsing import *
import pprint

def parse(s):
    return nestedExpr('{','}').parseString(s).asList()

def test(strng):
    print strng
    try:
        cfgFile = file(strng)
        cfgData = "".join( cfgFile.readlines() )
        list = parse( cfgData )
        pp = pprint.PrettyPrinter(2)
        pp.pprint(list)

    except ParseException, err:
        print err.line
        print " "*(err.column-1) + "^"
        print err

    cfgFile.close()
    print
    return list

if __name__ == '__main__':
    test('testfile')

But this fails with an error:

testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)

Traceback (most recent call last):
  File "xxx.py", line 55, in <module>
    test('testfile')
  File "xxx.py", line 40, in test
    return list
UnboundLocalError: local variable 'list' referenced before assignment  

What do I need to do to make this work? Is another parser than pyparsing better?

解决方案

Recursivity is the key here. Try something around that:

def parse(it):
    result = []
    while True:
        try:
            tk = next(it)
        except StopIteration:
            break

        if tk == '}':
            break
        val = next(it)
        if val == '{':
            result.append((tk,parse(it)))
        else:
            result.append((tk, val))

    return result

The use case:

import pprint       

data = """
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}
"""

r = parse(iter(data.split()))
pprint.pprint(r)

... which produce (Python 2.6):

[('Continent',
  [('Name', 'Europe'),
   ('Country',
    [('Name', 'UK'),
     ('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
     ('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]

Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing } -- I urge you to do a full test coverage ;)


EDIT: Discovering pyparsing, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:

import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas

def syntax():
    lbr = Literal( '{' ).suppress()
    rbr = Literal( '}' ).suppress()
    key = Word( alphas )
    atom = Word ( alphas )
    expr = Forward()
    pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
    expr << Group ( key + pair )

    return expr

expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)

Producing:

[['Continent',
  ['Name', 'Europe'],
  ['Country',
   ['Name', 'UK'],
   ['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
   ['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]

这篇关于用柯利布雷特斯解析文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆