使用缩进的文本文件中的列表创建树/深层嵌套字典 [英] Creating a tree/deeply nested dict with lists from an indented text file

查看:182
本文介绍了使用缩进的文本文件中的列表创建树/深层嵌套字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想遍历一个文件,并将每行的内容放入一个深度嵌套的字典中,该字典的结构由前导空格定义.这种愿望非常类似于

I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by leading whitespace. This desire is very much like that documented here. I've solved that but now have the problem of handling the case where repeating keys are overwritten instead of being cast into a list.

本质上:

a:
    b:      c
    d:      e
a:
    b:      c2
    d:      e2
    d:      wrench

在应放入{"a":{"b":"c2","d":"wrench"}}时被投射到{"a":{"b":"c2","d":"wrench"}}

is cast into {"a":{"b":"c2","d":"wrench"}} when it should be cast into

{"a":[{"b":"c","d":"e"},{"b":"c2","d":["e2","wrench"]}]}

一个独立的示例:

import json

def jsonify_indented_tree(tree):
    #convert indentet text into json
    parsedJson= {}
    parentStack = [parsedJson]
    for i, line in enumerate(tree):
        data = get_key_value(line)
        if data['key'] in parsedJson.keys(): #if parent key is repeated, then cast value as list entry
            # stuff that doesn't work
#            if isinstance(parsedJson[data['key']],list):
#                parsedJson[data['key']].append(parsedJson[data['key']])
#            else:
#                parsedJson[data['key']]=[parsedJson[data['key']]]
            print('Hey - Make a list now!')
        if data['value']: #process child by adding it to its current parent
            currentParent = parentStack[-1] #.getLastElement()
            currentParent[data['key']] = data['value']
            if i is not len(tree)-1:
                #determine when to switch to next branch
                level_dif = data['level']-get_key_value(tree[i+1])['level'] #peek next line level
                if (level_dif > 0):
                    del parentStack[-level_dif:] #reached leaf, process next branch
        else:
        #group node, push it as the new parent and keep on processing.
            currentParent = parentStack[-1] #.getLastElement()
            currentParent[data['key']] = {}
            newParent = currentParent[data['key']]
            parentStack.append(newParent)
    return parsedJson

def get_key_value(line):
    key = line.split(":")[0].strip()
    value = line.split(":")[1].strip()
    level = len(line) - len(line.lstrip())
    return {'key':key,'value':value,'level':level}

def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
    return None

#nested_string=['a:', '\tb:\t\tc', '\td:\t\te', 'a:', '\tb:\t\tc2', '\td:\t\te2']
#nested_string=['w:','\tgeneral:\t\tcase','a:','\tb:\t\tc','\td:\t\te','a:','\tb:\t\tc2','\td:\t\te2']
nested_string=['a:',
 '\tb:\t\tc',
 '\td:\t\te',
 'a:',
 '\tb:\t\tc2',
 '\td:\t\te2',
  '\td:\t\twrench']

pp_json(jsonify_indented_tree(nested_string))

推荐答案

(从逻辑上讲)这种方法更简单(虽然更长):

This approach is (logically) a lot more straightforward (though longer):

  1. 跟踪多行字符串中每行的levelkey-value
  2. 将此数据存储在列表的level键字典中: {level1:[dict1dict2]}
  3. 仅在仅键行中附加表示键的字符串:{level1:[dict1dict2"nestKeyA"]}
  4. 由于仅行意味着下一行更深,因此请在下一层进行处理:{level1:[dict1dict2"nestKeyA"] ,level2:[...]}.更深层次的level2内容本身可能只是另一条 key-only 行(并且下一个循环将添加新的层次level3,使其变为{level1:[dict1dict2"nestKeyA"],level2:["nestKeyB"],level3:[...]})或新的字典dict3,例如{level1:[dict2"nestKeyA"],level2:[dict3]
  5. 继续执行步骤1-4,直到当前行缩进少于前一行(表示返回到某些先前的作用域)为止.这就是我的示例中每行迭代的数据结构.

  1. Track the level and key-value pair of each line in your multi-line string
  2. Store this data in a level keyed dict of lists: {level1:[dict1,dict2]}
  3. Append only a string representing the key in a key-only line: {level1:[dict1,dict2,"nestKeyA"]}
  4. Since a key-only line means the next line is one level deeper, process that on the next level: {level1:[dict1,dict2,"nestKeyA"],level2:[...]}. The contents of some deeper level level2 may itself be just another key-only line (and the next loop will add a new level level3 such that it will become {level1:[dict1,dict2,"nestKeyA"],level2:["nestKeyB"],level3:[...]}) or a new dict dict3 such that {level1:[dict1,dict2,"nestKeyA"],level2:[dict3]
  5. Steps 1-4 continue until the current line is indented less than the previous one (signifying a return to some prior scope). This is what the data structure looks like on my example per line iteration.

0, {0: []}
1, {0: [{'k': 'sds'}]}
2, {0: [{'k': 'sds'}, 'a']}
3, {0: [{'k': 'sds'}, 'a'], 1: [{'b': 'c'}]}
4, {0: [{'k': 'sds'}, 'a'], 1: [{'b': 'c'}, {'d': 'e'}]}
5, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: []}
6, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}]}
7, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}, {'d': 'e2'}]}

然后需要发生两件事. 1 :需要检查字典列表中是否包含重复的键,并将这些重复的字典值中的任何一个合并到列表中-稍后将对此进行演示. 2 :从迭代4和5之间可以看出,将最深层次的命令列表(此处为1)组合为一个命令...最后,演示重复处理,请注意:

Then two things need to happen. 1: the list of dict need to be inspected for containing duplicate keys and any of those duplicated dict's values combined in a list - this will be demonstrated in a moment. 2: as can be seen between iteration 4 and 5, the list of dicts from the deepest level (here 1) are combined into one dict... Finally, to demonstrate duplicate handling observe:

[7b, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}, {'d': 'e2'}, {'d': 'wrench'}]}]
[7c, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, {'a': {'d': ['wrench', 'e2'], 'b': 'c2'}}], 1: []}]

其中wrenche2放置在列表中,该列表本身进入以其原始键作为键的dict.

where wrench and e2 are placed in a list that itself goes into a dict keyed by their original key.

重复步骤1-5,将更深的作用域指令提升到其父键上,直到达到当前行的作用域(级别)为止.

Repeat Steps 1-5, hoisting deeper scoped dicts up and onto their parent keys until the current line's scope (level) is reached.

代码如下:

import json

def get_kvl(line):
    key = line.split(":")[0].strip()
    value = line.split(":")[1].strip()
    level = len(line) - len(line.lstrip())
    return {'key':key,'value':value,'level':level}

def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
    return None

def jsonify_indented_tree(tree): #convert shitty sgml header into json
    level_map= {0:[]}
    tree_length=len(tree)-1
    for i, line in enumerate(tree):
        data = get_kvl(line)
        if data['level'] not in level_map.keys():
            level_map[data['level']]=[] # initialize
        prior_level=get_kvl(tree[i-1])['level']
        level_dif = data['level']-prior_level # +: line is deeper, -: shallower, 0:same
        if data['value']:
            level_map[data['level']].append({data['key']:data['value']})
        if not data['value'] or i==tree_length:
            if i==tree_length: #end condition
                level_dif = -len(list(level_map.keys()))        
            if level_dif < 0:
                for level in reversed(range(prior_level+level_dif+1,prior_level+1)): # (end, start)
                    #check for duplicate keys in current deepest (child) sibling group,
                    # merge them into a list, put that list in a dict 
                    key_freq={} #track repeated keys
                    for n, dictionary in enumerate(level_map[level]):
                        current_key=list(dictionary.keys())[0]
                        if current_key in list(key_freq.keys()):
                            key_freq[current_key][0]+=1
                            key_freq[current_key][1].append(n)
                        else:
                            key_freq[current_key]=[1,[n]]
                    for k,v in key_freq.items():
                        if v[0]>1: #key is repeated
                            duplicates_list=[]
                            for index in reversed(v[1]): #merge value of key-repeated dicts into list
                                duplicates_list.append(list(level_map[level].pop(index).values())[0])
                            level_map[level].append({k:duplicates_list}) #push that list into a dict on the same stack it came from
                    if i==tree_length and level==0: #end condition
                        #convert list-of-dict into dict
                        parsed_nest={k:v for d in level_map[level] for k,v in d.items()}
                    else:
                        #push current deepest (child) sibling group onto parent key
                        key=level_map[level-1].pop() #string
                        #convert child list-of-dict into dict
                        level_map[level-1].append({key:{k:v for d in level_map[level] for k,v in d.items()}})
                        level_map[level]=[] #reset deeper level
            level_map[data['level']].append(data['key'])
    return parsed_nest

nested_string=['k:\t\tsds', #need a starter key,value pair otherwise this won't work... fortunately I always have one
 'a:',
 '\tb:\t\tc',
 '\td:\t\te',
 'a:',
 '\tb:\t\tc2',
 '\td:\t\te2',
 '\td:\t\twrench']

pp_json(jsonify_indented_tree(nested_string))

这篇关于使用缩进的文本文件中的列表创建树/深层嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆