在 Python 中解析自定义文本文件 [英] Parse a custom text file in Python
问题描述
我有一段文字要解析,这是文字的简明形式.
I have a text to be parsed, this is a concise form of the text.
apple {
type=fruit
varieties {
color=red
origin=usa
}
}
输出应该如下图
apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa
到目前为止,我想出的唯一方法是 Python 中的一种广度优先方法.但我不知道如何让所有的孩子都进去.
So far the only thing I have come up with is a sort of breadth-first approach in python. But I cant figure out how to get all the children within.
progInput = """apple {
type=fruit
varieties {
color=red
origin=usa
}
}
"""
progInputSplitToLines = progInput.split('\n')
childrenList = []
root = ""
def hasChildren():
if "{" in progInputSplitToLines[0]:
global root
root = progInputSplitToLines[0].split(" ")[0]
for e in progInputSplitToLines[1:]:
if "=" in e:
childrenList.append({e.split("=")[0].replace(" ", ""),e.split("=")[1].replace(" ", "")})
hasChildren()
PS:我研究了 Python 中的树结构,发现了 anytree(https://anytree.readthedocs.io/en/latest/),你认为这对我有帮助吗?
PS: I looked into tree structures in Python and came across anytree (https://anytree.readthedocs.io/en/latest/), do you think it would help in my case?
你能帮我一下吗?我不太擅长解析文本.提前感谢一堆.:)
Would you please be able to help me out ? I'm not very good at parsing text. thanks a bunch in advance. :)
推荐答案
由于您的文件是 HOCON 格式,您可以尝试使用 pyhocon
HOCON 解析器模块来解决您的问题.
Since your file is in HOCON format, you can try using the pyhocon
HOCON parser module to solve your problem.
安装:要么运行 pip install pyhocon
,要么下载 github 存储库并使用 python setup.py install
执行手动安装.
Install: Either run pip install pyhocon
, or download the github repo and perform a manual install with python setup.py install
.
基本用法:
from pyhocon import ConfigFactory
conf = ConfigFactory.parse_file('text.conf')
print(conf)
给出以下嵌套结构:
ConfigTree([('apple', ConfigTree([('type', 'fruit'), ('varieties', ConfigTree([('color', 'red'), ('origin', 'usa')]))]))])
ConfigTree
只是一个 collections.OrderedDict()
,如在源代码.
更新:
为了得到你想要的输出,你可以制作自己的递归函数来收集所有路径:
To get your desired output, you can make your own recursive function to collect all paths:
from pyhocon import ConfigFactory
from pyhocon.config_tree import ConfigTree
def config_paths(config):
for k, v in config.items():
if isinstance(v, ConfigTree):
for k1, v1 in config_paths(v):
yield (k,) + k1, v1
else:
yield (k,), v
config = ConfigFactory.parse_file('text.conf')
for k, v in config_paths(config):
print('%s=%s' % ('.'.join(k), v))
输出:
apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa
这篇关于在 Python 中解析自定义文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!