从python中的缩进文本文件创建树/深层嵌套dict [英] Creating a tree/deeply nested dict from an indented text file in python

查看:24
本文介绍了从python中的缩进文本文件创建树/深层嵌套dict的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我想遍历一个文件并将每一行的内容放入一个深度嵌套的字典中,其结构由每行开头的空格量定义.

Basically, I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by the amount of whitespace at the start of each line.

本质上的目的是采取这样的事情:

Essentially the aim is to take something like this:

a
    b
        c
    d
        e

然后把它变成这样:

{"a":{"b":"c","d":"e"}}

或者这个:

apple
    colours
        red
        yellow
        green
    type
        granny smith
    price
        0.10

进入这个:

{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10}

这样我就可以将它发送到 Python 的 JSON 模块并生成一些 JSON.

So that I can send it to Python's JSON module and make some JSON.

目前我正在尝试按如下步骤制作字典和列表:

At the moment I'm trying to make a dict and a list in steps like such:

  1. {"a":""} ["a"]
  2. {"a":"b"} ["a"]
  3. {"a":{"b":"c"}} ["a","b"]
  4. {"a":{"b":{"c":"d"}}}} ["a","b","c"]
  5. {"a":{"b":{"c":"d"},"e":""}} ["a","e"]
  6. {"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
  7. {"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]

列表就像面包屑"一样,显示了我上次输入字典的位置.

The list acts like 'breadcrumbs' showing where I last put in a dict.

要做到这一点,我需要一种方法来遍历列表并生成类似于 dict["a"]["e"]["f"] 的内容以获取最后一个 dict.我看过有人制作的 AutoVivification 类,它看起来非常有用,但我真的不确定:

To do this I need a way to iterate through the list and generate something like dict["a"]["e"]["f"] to get at that last dict. I've had a look at the AutoVivification class that someone has made which looks very useful however I'm really unsure of:

  1. 我是否为此使用了正确的数据结构(我计划将其发送到 JSON 库以创建 JSON 对象)
  2. 在这种情况下如何使用 AutoVivification
  3. 是否有更好的方法来解决这个问题.

我想出了以下功能,但它不起作用:

I came up with the following function but it doesn't work:

def get_nested(dict,array,i):
if i != None:
    i += 1
    if array[i] in dict:
        return get_nested(dict[array[i]],array)
    else:
        return dict
else:
    i = 0
    return get_nested(dict[array[i]],array)

希望得到帮助!

(我剩下的极其不完整的代码在这里:)

(The rest of my extremely incomplete code is here:)

#Import relevant libraries
import codecs
import sys

#Functions
def stripped(str):
    if tab_spaced:
        return str.lstrip('	').rstrip('

')
    else:
        return str.lstrip().rstrip('

')

def current_ws():
    if whitespacing == 0 or not tab_spaced:
        return len(line) - len(line.lstrip())
    if tab_spaced:
        return len(line) - len(line.lstrip('	

'))

def get_nested(adict,anarray,i):
    if i != None:
        i += 1
        if anarray[i] in adict:
            return get_nested(adict[anarray[i]],anarray)
        else:
            return adict
    else:
        i = 0
        return get_nested(adict[anarray[i]],anarray)

#initialise variables
jsondict = {}
unclosed_tags = []
debug = []

vividfilename = 'simple.vivid'
# vividfilename = sys.argv[1]
if len(sys.argv)>2:
    jsfilename = sys.argv[2]
else:
    jsfilename = vividfilename.split('.')[0] + '.json'

whitespacing = 0
whitespace_array = [0,0]
tab_spaced = False

#open the file
with codecs.open(vividfilename,'rU', "utf-8-sig") as vividfile:
    for line in vividfile:
        #work out how many whitespaces at start
        whitespace_array.append(current_ws())

        #For first line with whitespace, work out the whitespacing (eg tab vs 4-space)
        if whitespacing == 0 and whitespace_array[-1] > 0:
            whitespacing = whitespace_array[-1]
            if line[0] == '	':
                tab_spaced = True

        #strip out whitespace at start and end
        stripped_line = stripped(line)

        if whitespace_array[-1] == 0:
            jsondict[stripped_line] = ""
            unclosed_tags.append(stripped_line)

        if whitespace_array[-2] < whitespace_array[-1]:
            oldnested = get_nested(jsondict,whitespace_array,None)
            print oldnested
            # jsondict.pop(unclosed_tags[-1])
            # jsondict[unclosed_tags[-1]]={stripped_line:""}
            # unclosed_tags.append(stripped_line)

        print jsondict
        print unclosed_tags

print jsondict
print unclosed_tags

推荐答案

这是一个递归解决方案.首先,按以下方式转换输入.

Here is a recursive solution. First, transform the input in the following way.

输入:

person:
    address:
        street1: 123 Bar St
        street2: 
        city: Madison
        state: WI
        zip: 55555
    web:
        email: boo@baz.com

第一步输出:

[{'name':'person','value':'','level':0},
 {'name':'address','value':'','level':1},
 {'name':'street1','value':'123 Bar St','level':2},
 {'name':'street2','value':'','level':2},
 {'name':'city','value':'Madison','level':2},
 {'name':'state','value':'WI','level':2},
 {'name':'zip','value':55555,'level':2},
 {'name':'web','value':'','level':1},
 {'name':'email','value':'boo@baz.com','level':2}]

使用 split(':') 和计算前导标签的数量很容易实现:

This is easy to accomplish with split(':') and by counting the number of leading tabs:

def tab_level(astr):
    """Count number of leading tabs in a string
    """
    return len(astr)- len(astr.lstrip('	'))

然后将第一步输出输入到以下函数中:

Then feed the first-step output into the following function:

def ttree_to_json(ttree,level=0):
    result = {}
    for i in range(0,len(ttree)):
        cn = ttree[i]
        try:
            nn  = ttree[i+1]
        except:
            nn = {'level':-1}

        # Edge cases
        if cn['level']>level:
            continue
        if cn['level']<level:
            return result

        # Recursion
        if nn['level']==level:
            dict_insert_or_append(result,cn['name'],cn['value'])
        elif nn['level']>level:
            rr = ttree_to_json(ttree[i+1:], level=nn['level'])
            dict_insert_or_append(result,cn['name'],rr)
        else:
            dict_insert_or_append(result,cn['name'],cn['value'])
            return result
    return result

哪里:

def dict_insert_or_append(adict,key,val):
    """Insert a value in dict at key if one does not exist
    Otherwise, convert value to list and append
    """
    if key in adict:
        if type(adict[key]) != list:
            adict[key] = [adict[key]]
        adict[key].append(val)
    else:
        adict[key] = val

这篇关于从python中的缩进文本文件创建树/深层嵌套dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆