在Python中建立嵌套字典,从文件中逐行读取 [英] Building Nested dictionary in Python reading in line by line from file
问题描述
我去做嵌套字典的方式是这样的:
The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp
dicty == {"A" : {"a" : 1, "b" : 1}}
当我尝试在大文件中逐行读取时,就开始出现问题. 这是在列表中每行打印内容:
The problem starts when I try to implement this on a big file, reading in line by line. This is printing the content per line in a list:
['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']
我想以嵌套字典结尾(忽略小数点):
I would like to end up with a nested dictionary (ignore decimals):
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
当我逐行阅读时,我必须检查是否在文件中仍然找到第一个单词(它们都被分组了),然后再将其作为完整的dict添加到较高的dict.
As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.
这是我的实现:
def doubleDict(filename):
dicty = dict()
with open(filename, "r") as f:
row = 0
tmp = dict()
oldword = ""
for line in f:
values = line.rstrip().split(" ")
print(values)
if oldword == values[0]:
tmp[values[1]] = values[2]
else:
if oldword is not "":
dicty[oldword] = tmp
tmp.clear()
oldword = values[0]
tmp[values[1]] = values[2]
row += 1
if row % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
我实际上很想在大熊猫里吃这个,但是现在我很高兴能把它当作字典使用.出于某种原因,在仅阅读了前5行之后,我得出了以下结论:
I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:
{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},
这显然是不正确的.怎么了?
which is clearly incorrect. What is wrong?
推荐答案
Use a collections.defaultdict()
object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
我在这里使用enumerate()
来生成行数;比保持单独的计数器运行要简单得多.
I used enumerate()
to generate the line count here; much simpler than keeping a separate counter going.
即使没有defaultdict
,也可以让外部字典保留对嵌套字典的引用,然后使用values[0]
再次检索它;无需保留temp
参考:
Even without a defaultdict
, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]
; there is no need to keep the temp
reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
defaultdict
所要做的就是让我们不必测试是否已经创建了该嵌套字典.代替:
All the defaultdict
then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
我们只是省略了if
测试,因为defaultdict
将为我们创建一个新的字典,如果密钥还不存在的话.
we simply omit the if
test as defaultdict
will create a new dictionary for us if the key was not yet present.
这篇关于在Python中建立嵌套字典,从文件中逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!