在Python中建立嵌套字典,从文件中逐行读取 [英] Building Nested dictionary in Python reading in line by line from file

查看:901
本文介绍了在Python中建立嵌套字典,从文件中逐行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我去做嵌套字典的方式是这样的:

The way I go about nested dictionary is this:

dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp

dicty == {"A" : {"a" : 1, "b" : 1}}

当我尝试在大文件中逐行读取时,就开始出现问题. 这是在列表中每行打印内容:

The problem starts when I try to implement this on a big file, reading in line by line. This is printing the content per line in a list:

['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']

我想以嵌套字典结尾(忽略小数点):

I would like to end up with a nested dictionary (ignore decimals):

{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
 'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}

当我逐行阅读时,我必须检查是否在文件中仍然找到第一个单词(它们都被分组了),然后再将其作为完整的dict添加到较高的dict.

As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.

这是我的实现:

def doubleDict(filename):
    dicty = dict()
    with open(filename, "r") as f:
        row = 0
        tmp = dict()
        oldword = ""
        for line in f:
            values = line.rstrip().split(" ")
            print(values)
            if oldword == values[0]:
                tmp[values[1]] = values[2]
            else:
                if oldword is not "":
                    dicty[oldword] = tmp
                tmp.clear()
                oldword = values[0]
                tmp[values[1]] = values[2]
            row += 1
            if row % 25 == 0:
                print(dicty)
                break #print(row)
    return(dicty)

我实际上很想在大熊猫里吃这个,但是现在我很高兴能把它当作字典使用.出于某种原因,在仅阅读了前5行之后,我得出了以下结论:

I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:

{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},

这显然是不正确的.怎么了?

which is clearly incorrect. What is wrong?

推荐答案

使用

Use a collections.defaultdict() object to auto-instantiate nested dictionaries:

from collections import defaultdict

def doubleDict(filename):
    dicty = defaultdict(dict)
    with open(filename, "r") as f:
        for i, line in enumerate(f):
            outer, inner, value = line.split()
            dicty[outer][inner] = value
            if i % 25 == 0:
                print(dicty)
                break #print(row)
    return(dicty)

我在这里使用enumerate()来生成行数;比保持单独的计数器运行要简单得多.

I used enumerate() to generate the line count here; much simpler than keeping a separate counter going.

即使没有defaultdict,也可以让外部字典保留对嵌套字典的引用,然后使用values[0]再次检索它;无需保留temp参考:

Even without a defaultdict, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]; there is no need to keep the temp reference around:

>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}

defaultdict所要做的就是让我们不必测试是否已经创建了该嵌套字典.代替:

All the defaultdict then does is keep us from having to test if we already created that nested dictionary. Instead of:

if outer not in dicty:
    dicty[outer] = {}
dicty[outer][inner] = value

我们只是省略了if测试,因为defaultdict将为我们创建一个新的字典,如果密钥还不存在的话.

we simply omit the if test as defaultdict will create a new dictionary for us if the key was not yet present.

这篇关于在Python中建立嵌套字典,从文件中逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆