用柯利布雷特斯解析文件 [英] parsing file with curley brakets
问题描述
我需要解析一个文件,文件中的信息用大括号括起来,例如:
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
这是我在Python中尝试过的内容
from io import open
from pyparsing import *
import pprint
def parse(s):
return nestedExpr('{','}').parseString(s).asList()
def test(strng):
print strng
try:
cfgFile = file(strng)
cfgData = "".join( cfgFile.readlines() )
list = parse( cfgData )
pp = pprint.PrettyPrinter(2)
pp.pprint(list)
except ParseException, err:
print err.line
print " "*(err.column-1) + "^"
print err
cfgFile.close()
print
return list
if __name__ == '__main__':
test('testfile')
但这失败并显示错误:
testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)
Traceback (most recent call last):
File "xxx.py", line 55, in <module>
test('testfile')
File "xxx.py", line 40, in test
return list
UnboundLocalError: local variable 'list' referenced before assignment
我需要做些什么才能使它起作用? 比pyparsing更好的解析器吗?
递归是这里的关键.尝试解决该问题:
def parse(it):
result = []
while True:
try:
tk = next(it)
except StopIteration:
break
if tk == '}':
break
val = next(it)
if val == '{':
result.append((tk,parse(it)))
else:
result.append((tk, val))
return result
用例:
import pprint
data = """
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
"""
r = parse(iter(data.split()))
pprint.pprint(r)
...产生(Python 2.6):
[('Continent',
[('Name', 'Europe'),
('Country',
[('Name', 'UK'),
('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]
请仅以此为起点,并随时根据需要改进代码(取决于数据,字典可能是更好的选择).另外,该示例代码无法处理格式错误的数据(尤其是多余或丢失的}
-我敦促您进行完整的测试;
编辑:发现pyparsing
时,我尝试了以下方法,它们看起来(效果)更好(可以),并且可以(更多)轻松地满足特殊需求:
import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas
def syntax():
lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress()
key = Word( alphas )
atom = Word ( alphas )
expr = Forward()
pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
expr << Group ( key + pair )
return expr
expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)
制作:
[['Continent',
['Name', 'Europe'],
['Country',
['Name', 'UK'],
['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]
I need to parse a file with information separated by curly brackets, for example:
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
Here is what I have tried in Python
from io import open
from pyparsing import *
import pprint
def parse(s):
return nestedExpr('{','}').parseString(s).asList()
def test(strng):
print strng
try:
cfgFile = file(strng)
cfgData = "".join( cfgFile.readlines() )
list = parse( cfgData )
pp = pprint.PrettyPrinter(2)
pp.pprint(list)
except ParseException, err:
print err.line
print " "*(err.column-1) + "^"
print err
cfgFile.close()
print
return list
if __name__ == '__main__':
test('testfile')
But this fails with an error:
testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)
Traceback (most recent call last):
File "xxx.py", line 55, in <module>
test('testfile')
File "xxx.py", line 40, in test
return list
UnboundLocalError: local variable 'list' referenced before assignment
What do I need to do to make this work? Is another parser than pyparsing better?
Recursivity is the key here. Try something around that:
def parse(it):
result = []
while True:
try:
tk = next(it)
except StopIteration:
break
if tk == '}':
break
val = next(it)
if val == '{':
result.append((tk,parse(it)))
else:
result.append((tk, val))
return result
The use case:
import pprint
data = """
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
"""
r = parse(iter(data.split()))
pprint.pprint(r)
... which produce (Python 2.6):
[('Continent',
[('Name', 'Europe'),
('Country',
[('Name', 'UK'),
('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]
Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing }
-- I urge you to do a full test coverage ;)
EDIT: Discovering pyparsing
, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:
import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas
def syntax():
lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress()
key = Word( alphas )
atom = Word ( alphas )
expr = Forward()
pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
expr << Group ( key + pair )
return expr
expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)
Producing:
[['Continent',
['Name', 'Europe'],
['Country',
['Name', 'UK'],
['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]
这篇关于用柯利布雷特斯解析文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!