实现嵌套字典的最佳方式是什么? [英] What is the best way to implement nested dictionaries?

查看:224
本文介绍了实现嵌套字典的最佳方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基本上相当于一个嵌套字典的数据结构。让我们来看看吧:

  {'new jersey':{'mercer county':{'plumbers':3, 
'programmer':81},
'middlesex县':{'程序员':81,
'salesmen':62}},
'new york'皇后县:{'plumbers':9,
'salesmen':36}}}



<现在,维护和创造这是非常痛苦的;每次我有一个新的州/县/职业,我必须通过讨厌的try / catch块创建下层字典。此外,如果我想查看所有的值,我必须创建烦人的嵌套迭代器。



我也可以使用元组作为键,如:

  3,
('new jersey','mercer county','程序员'):81,
('new jersey' ,middlesex县,程序员):81,
(新泽西州,米德尔塞克斯县,推销员):62,
(纽约州,皇后县 'plumbers'):9,
('new york','queens county','salesmen'):36}

这使得迭代值非常简单和自然,但是做一些像聚合和查看字典的子集这样的事情更符合语法上的痛苦(例如,如果我只想逐个逐个地去)



基本上,有时候我想把一个嵌套字典看成是一个平面字典,有时候我想把它看作一个复杂的层次结构。我可以把这全部包装在一个班上,但似乎有人可能已经做了这个。或者,似乎可能会有一些非常优雅的语法结构来做到这一点。



我该怎么做得更好?



附录:我知道 setdefault(),但它并没有真正使干净的语法。此外,您创建的每个子词典仍然需要手动设置 setdefault()

解决方案


在Python中实现嵌套字典的最好方法是什么?


code> __缺少__ 在 dict 子类中设置并返回新的实例!



这是一个更优雅的方法(并记录在案)自Python 2.5以来(对我来说特别有价值)它很像一个正常的dict 打印,而不是自动修改的defaultdict的丑陋打印:



$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $#$#####################################保留本地指针值
返回值#速度比dict查找更快

注意 self [key] 在作业的左侧,所以这里没有递归。



这是2016年9月23日之前接受的答案的一半代码。



说明:



每当钥匙被访问但丢失时,我们只提供我们的类 Vividict 的另一个嵌套实例。 (返回值分配是有用的,因为它避免了我们在dict上另外调用getter,不幸的是,我们无法返回它的设置。)



注意,这些是与最受欢迎的答案相同的语义,但代码的一半代码 - nosklo的实现:


  class AutoVivification(dict):
执行perl的自动修复功能。
def __getitem __(self,item):
try:
return dict
$ b $ = $($)
返回值




演示使用情况



下面只是一个例子dict可以很容易地用于在飞行中创建一个嵌套的dict结构。这可以很快地创建一个层次化的树结构,就像你想要的那样深入。

  import pprint 

class Vividict(dict):
def __missing __(self,key):
value = self [key] = type(self)()
return value

d = Vividict()

d ['foo'] ['bar']
d ['foo'] ['baz']
d ['fizz'] ['buzz']
d ['primary'] ['secondary'] ['third'] ['quarter']
pprint.pprint(d)
  {'fizz':{'buzz': {}},
'foo':{'bar':{},'baz':{}},
'primary':{'secondary':{' :{}}}}}

正如最后一行所示,它漂亮的打印精美,按顺序用于手动检查。但是,如果要视觉检查数据,请执行 __缺少__ 以将其类的新实例设置为键并返回它是一个更好的解决方案。



其他替代方案,相比之下:



dict.setdefault



setdefault在循环中使用非常好,你不知道你将要获得的密钥,但重复的使用变得相当繁重,我不认为任何人会想保持以下几点:

  d = dict()

d.setdefault('foo ',{})。setdefault('bar',{})
d.setdefault('foo',{})setdefault('baz',{})
d.setdefault('fizz ',{})。setdefault('buzz',{})
d.setdefault('primary',{})setdefault('secondary',{}) .setdefault('quarter',{})

另一个批评是,setdefault需要一个新的实例,无论是是否使用但是,Python比较聪明地处理未使用和未引用的新实例,例如,它将内存中的位置重新使用:

 > >> id({}),id({}),id({})
(523575344,523575344,523575344)



自动创建的defaultdict



这是一个干净的实现,脚本中不检查数据的使用将是与实现 __缺少__ 有用:

 从集合import defaultdict 

def vivdict():
return defaultdict(vivdict)

但是您需要检查您的数据,自动生成的defaultdict的结果以相同的方式填充数据,如下所示:

  >>> d = vivdict(); d ['foo'] ['bar']; d ['foo'] ['baz']; ['fizz'] ['buzz']; d ['primary'] ['secondary'] ['third'] ['quarter'];进口打印; 
>>> pprint.pprint(d)
defaultdict(<功能vivdict at 0x17B01870>,{'foo':defaultdict(< function vivdict
at 0x17B01870>,'''''defaultDict(< function vivdict在0x17B01870> {}),'bar':
defaultdict(< function vivdict at 0x17B01870> {})}),'primary':defaultdict(< function
vivdict at 0x17B01870& {'secondary':defaultdict(< function vivdict at 0x17B01870>,
{'third':defaultdict(< function vivdict at 0x17B01870>,{'quarter':defaultdict(
< function vivdict at 0x17B01870>,}}),'fizz':defaultdict(< function $ v $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ bu}':::::::}}}}}}}}}}}}}}}}}}}}}}} )})

这个输出是相当不起眼的,结果是不可读的。通常给出的解决方案是递归转换为手动检查的dict。



性能



最后,让我们来看看这个不平凡的解决方案性能。我减去了实例化的费用。

 >>> import timeit 
>>> min(timeit.repeat(lambda:{} .setdefault('foo',{})) - min(timeit.repeat(lambda:{}))
0.13612580299377441
>>> min(timeit.repeat(lambda:vivdict()['foo'])) - min(timeit.repeat(lambda:vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda:Vividict()['foo'])) - min(timeit.repeat(lambda:Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda:AutoVivification()['foo'])) - min(timeit.repeat(lambda:AutoVivification()))
2.138362169265747
pre>

根据性能, dict.setdefault 的工作效果最好。如果您关心执行速度,我强烈建议您使用生产代码。



如果您需要这个交互式使用(在IPython笔记本中,也许),然后性能并不重要 - 在这种情况下,我会与Vividict进行可读性的输出。与AutoVivification对象(使用 __ getitem __ 而不是 __缺少__ (为此目的而设计)相比,它是非常优越的



结论



在子类上实现 __缺少__ dict 设置和返回一个新的实例比替代方案稍微困难一些,但是有以下优点:




  • 简单实例

  • 简单数据资料

  • 简单数据查看



,因为它比修改 __ getitem __ 更不复杂,性能更好,应该优先于该方法。


I have a data structure which essentially amounts to a nested dictionary. Let's say it looks like this:

{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

Now, maintaining and creating this is pretty painful; every time I have a new state/county/profession I have to create the lower layer dictionaries via obnoxious try/catch blocks. Moreover, I have to create annoying nested iterators if I want to go over all the values.

I could also use tuples as keys, like such:

{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

This makes iterating over the values very simple and natural, but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary (e.g. if I just want to go state-by-state).

Basically, sometimes I want to think of a nested dictionary as a flat dictionary, and sometimes I want to think of it indeed as a complex hierarchy. I could wrap this all in a class, but it seems like someone might have done this already. Alternatively, it seems like there might be some really elegant syntactical constructions to do this.

How could I do this better?

Addendum: I'm aware of setdefault() but it doesn't really make for clean syntax. Also, each sub-dictionary you create still needs to have setdefault() manually set.

解决方案

What is the best way to implement nested dictionaries in Python?

Implement __missing__ on a dict subclass to set and return a new instance!

Here is a more elegant approach that has been available (and documented) since Python 2.5, and (particularly valuable to me) it pretty prints just like a normal dict, instead of the ugly printing of an autovivified defaultdict:

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)() # retain local pointer to value
        return value                     # faster to return than dict lookup

Note self[key] is on the left-hand side of assignment, so there's no recursion here.

This is half the lines of code of what was the accepted answer until September 23, 2016.

Explanation:

We're just providing another nested instance of our class Vividict whenever a key is accessed but missing. (Returning the value assignment is useful because it avoids us additionally calling the getter on the dict, and unfortunately, we can't return it as it is being set.)

Note, these are the same semantics as the most upvoted answer but in half the lines of code - nosklo's implementation:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Demonstration of Usage

Below is just an example of how this dict could be easily used to create a nested dict structure on the fly. This can quickly create a hierarchical tree structure as deeply as you might want to go.

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

Which outputs:

{'fizz': {'buzz': {}},
 'foo': {'bar': {}, 'baz': {}},
 'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}

And as the last line shows, it pretty prints beautifully and in order for manual inspection. But if you want to visually inspect your data, implementing __missing__ to set a new instance of its class to the key and return it is a far better solution.

Other alternatives, for contrast:

dict.setdefault

setdefault works great when used in loops and you don't know what you're going to get for keys, but repetitive usage becomes quite burdensome, and I don't think anyone would want to keep up the following:

d = dict()

d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})

Another criticism is that setdefault requires a new instance whether it is used or not. However, Python is rather smart about handling unused and unreferenced new instances, for example, it reuses the location in memory:

>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)

An auto-vivified defaultdict

This is a clean looking implementation, and usage in a script that you're not inspecting the data on would be as useful as implementing __missing__:

from collections import defaultdict

def vivdict():
    return defaultdict(vivdict)

But if you need to inspect your data, the results of an auto-vivified defaultdict populated with data in the same way looks like this:

>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint; 
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict 
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar': 
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function 
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>, 
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at 
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})

This output is quite inelegant, and the results are quite unreadable. The solution typically given is to recursively convert back to a dict for manual inspection. This non-trivial solution is left as an exercise for the reader.

Performance

Finally, let's look at performance. I'm subtracting the costs of instantiation.

>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747

Based on performance, dict.setdefault works the best. I'd highly recommend it for production code, in cases where you care about execution speed.

If you need this for interactive use (in an IPython notebook, perhaps) then performance doesn't really matter - in which case, I'd go with Vividict for readability of the output. Compared to the AutoVivification object (which uses __getitem__ instead of __missing__, which was made for this purpose) it is far superior.

Conclusion

Implementing __missing__ on a subclassed dict to set and return a new instance is slightly more difficult than alternatives but has the benefits of

  • easy instantiation
  • easy data population
  • easy data viewing

and because it is less complicated and more performant than modifying __getitem__, it should be preferred to that method.

这篇关于实现嵌套字典的最佳方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆