Python将文本文件解析为嵌套字典 [英] Python parse text file into nested dictionaries

查看:339
本文介绍了Python将文本文件解析为嵌套字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据结构:

[HEADER1]
{
   key value
   key value
   ...
   [HEADER2]
   {
      key value
      ...
   }
   key value
   [HEADER3]
   {
      key value
      [HEADER4]
      {
         key value
         ...
      }
   }
   key value
}

原始数据中没有缩进,但是为了清楚起见,我在此处添加了缩进.键值对的数量未知,"..."表示可能还有更多 在每个[HEADER]块中. [HEADER]块的数量也是未知的.

There are no indents in the raw data, but I added them here for clarity. The number of key-value pairs is unknown, '...' indicates there could be many more within each [HEADER] block. Also the amount of [HEADER] blocks is unknown.

请注意,该结构是嵌套的,因此在此示例中,标头2和3在标头1内,标头4在标头3内.

Note that the structure is nested, so in this example header 2 and 3 are inside header 1 and header 4 is inside header 3.

可以有更多(嵌套)标题,但我将示例简短了.

There can be many more (nested) headers, but I kept the example short.

如何将其解析为嵌套的字典结构?每个[HEADER]都是大括号内后面内容的关键.

How do I go about parsing this into a nested dictionary structure? Each [HEADER] should be the key to whatever follows inside the curly brackets.

最终结果应该类似于:

dict = {'HEADER1': 'contents of 1'}
contents of 1 = {'key': 'value', 'key': 'value', 'HEADER2': 'contents of 2', etc}

我猜想我需要某种递归函数,但是我对Python很陌生,不知道从哪里开始.

I'm guessing I need some sort of recursive function, but I am pretty new to Python and have no idea where to start.

对于初学者来说,我可以按如下所示拔出所有[HEADER]键:

For starters, I can pull out all the [HEADER] keys as follows:

path = 'mydatafile.txt'
keys = []

with open (path, 'rt') as file:
   for line in file:
      if line.startswith('['):
         keys.append(line.rstrip('\n'))

for key in keys:
   print(key)

然后呢,也许甚至不需要?

But then what, maybe this not even needed?

有什么建议吗?

推荐答案

您可以使用少量正则表达式对文件内容进行预格式化,然后将其传递给json.loads

You can do it by pre-formatting your file content using few regex and then pass it to json.loads

您可以一对一地进行以下正则表达式替换:

You can do these kind of regex substitutions one by one:

#1 \[(\w*)\]\n -> "$1":

#2 \}\n(\w) -> },$1

#3 (\w*)\s(\w*)\n([^}]) -> $1:$2,$3

#4 (\w*)\s(\w*)\n\} -> $1:$2}

,然后最终将最终字符串传递给json.loads:

and then finally pass the final string to json.loads:

import json
d = json.loads(s)

它将解析为字典格式.

说明:

1. \[(\w*)\]\n:用"HEADERS":

2. \}\n(\w):用},

3. (\w*)\s(\w*)\n([^}]):对于具有下一个元素的行,将key value\n替换为key:value,

3. (\w*)\s(\w*)\n([^}]): replace key value\n with key:value, for lines having any next elements

4. (\w*)\s(\w*)\n\}:对于没有下一个元素的行,将key value\n替换为key:value

4. (\w*)\s(\w*)\n\}: replace key value\n with key:value for lines having no next elements

因此,通过对这些正则表达式进行较小的修改,您将能够将其解析为dict格式,其基本概念是将文件内容重新格式化为易于解析的格式.

So, by minor modifications to these regexes you will be able to parse it to a dict format, the basic concept is to reformat the file contents to a format that can be parsed easily.

这篇关于Python将文本文件解析为嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆