元组中的任意嵌套字典 [英] Arbitrarily nested dictionary from tuples

查看:165
本文介绍了元组中的任意嵌套字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个以元组为键(而数字/标量为值)的字典,用Python方式转换为嵌套字典的方式是什么?问题在于从输入到输入,元组的长度是任意的.

Given a dictionary with tuples as keys (and numbers/scalars as values), what is a Pythonic way to convert to a nested dictionary? The hitch is that from input-to-input, the tuples are of arbitrary length.

下面的d1d2d3证明了嵌套的增加:

Below, d1, d2, and d3 demonstrate increasing nestedness:

from itertools import product

d1 = dict(zip(product('AB', [0, 1]), range(2*2)))
d2 = dict(zip(product('AB', [0, 1], [True, False]), range(2*2*2)))
d3 = dict(zip(product('CD', [0, 1], [True, False], 'AB'), range(2*2*2*2)))

它们的嵌套版本将是:

# For d1
{'A': {0: 0, 1: 1}, 'B': {0: 2, 1: 3}}

# For d2
{'A': {0: {True: 0, False: 1}, 1: {True: 2, False: 3}},
 'B': {0: {True: 4, False: 5}, 1: {True: 6, False: 7}}}

# Beginning of result for d3
{
'C': {
    0: {
        True: {
            'A': 0
            'B': 1
        },
        False: {
            'A': 2,
            'B': 3
        },
    1: # ...


我的尝试:我喜欢这种用于初始化空数据结构的技巧,该方法在许多其他的SO答案中给出:


My attempts: I like the this trick for initializing an empty data structure, which is given in a number of other SO answers:

from collections import defaultdict

def nested_dict():
    return defaultdict(nested_dict)

但是由于级别数不确定,因此难以实现.我可以使用类似的东西:

But am having trouble implementing this because the number of levels is uncertain. I could use something like:

def nest(d: dict) -> dict:
    res = nested_dict()
    for (i, j, k), v in d.items():
        res[i][j][k] = v
    return res

但这对d2 起作用,因为它的键在上面有3个层(i,j,k).

But this would only work for d2 because its keys have 3 levels (i, j, k) above.

这是我尝试将其推广的解决方案,但我想这里有一条更简单的路线.

Here's my attempt at a solution to generalizing this, but I'm guessing there is a simpler route.

def set_arbitrary_nest(keys, value):
    """
    >>> keys = 1, 2, 3
    >>> value = 5
    result --> {1: {2: {3: 5}}}
    """

    it = iter(keys)
    last = next(it)
    res = {last: {}}
    lvl = res
    while True:
        try:
            k = next(it)
            lvl = lvl[last]
            lvl[k] = {}
            last = k
        except StopIteration:
            lvl[k] = value
            return res

>>> set_arbitrary_nest([1, 2, 3], 5)
{1: {2: {3: 5}}}

推荐答案

只需循环遍历每个键,并使用键的最后一个元素(除最后一个元素外)来添加字典.保留对如此设置的最后一个字典的引用,然后使用键元组中的最后一个元素在输出字典中实际设置一个键值对:

Just loop over each key, and use all but the last element of the key to add dictionaries. Keep a reference to the last dictionary so set, then use the last element in the key tuple to actually set a key-value pair in the output dictionary:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

您可以使用 functools.reduce() 来处理遍历字典的工作:

You could use functools.reduce() to handle the traversing-down-the-dictionaries work:

from functools import reduce

def nest(d: dict) -> dict:
    result = {}
    traverse = lambda r, k: r.setdefault(k, {})
    for key, value in d.items():
        reduce(traverse, key[:-1], result)[key[-1]] = value
    return result

我使用dict.setdefault()而不是自动执行defaultdict(nested_dict)选项,因为这会生成一个常规字典,当字典不存在时,它不会进一步隐式添加键.

I used dict.setdefault() rather than the auto-vivication defaultdict(nested_dict) option, as this produces a regular dictionary that won't further implicitly add keys when they don't yet exist.

演示:

>>> from pprint import pprint
>>> pprint(nest(d1))
{'A': {0: 0, 1: 1}, 'B': {0: 2, 1: 3}}
>>> pprint(nest(d2))
{'A': {0: {False: 1, True: 0}, 1: {False: 3, True: 2}},
 'B': {0: {False: 5, True: 4}, 1: {False: 7, True: 6}}}
>>> pprint(nest(d3))
{'C': {0: {False: {'A': 2, 'B': 3}, True: {'A': 0, 'B': 1}},
       1: {False: {'A': 6, 'B': 7}, True: {'A': 4, 'B': 5}}},
 'D': {0: {False: {'A': 10, 'B': 11}, True: {'A': 8, 'B': 9}},
       1: {False: {'A': 14, 'B': 15}, True: {'A': 12, 'B': 13}}}}

请注意,上述解决方案是一个干净的O(N)循环(N是输入字典的长度),而Ajax1234提出的groupby解决方案必须对输入进行 sort 才能工作,使之成为O(NlogN)解决方案.这意味着对于具有1000个元素的字典,groupby()将需要10.000步才能生成输出,而O(N)线性循环仅需1000步.对于一百万个键,这将增加到2000万步,等等.

Note that the above solution is a clean O(N) loop (N being the length of the input dictionary), whereas a groupby solution as proposed by Ajax1234 has to sort the input to work, making that a O(NlogN) solution. That means that for a dictionary with 1000 elements, a groupby() would need 10.000 steps to produce the output, whereas an O(N) linear loop only takes 1000 steps. For a million keys, this increases to 20 million steps, etc.

此外,Python中的递归速度很慢,因为Python无法针对迭代方法优化此类解决方案.函数调用相对昂贵,因此使用递归会带来可观的性能成本,因为大大增加了函数调用的数量和扩展框架堆栈操作.

Moreover, recursion in Python is.. slow, as Python can't optimise such solutions to an iterative approach. Function calls are relatively expensive, so using recursion can carry significant performance costs as you greatly increase the number of function calls and by extension frame stack operations.

时间试用显示这有多重要;使用示例d3并运行100k,我们很容易看到5倍的速度差:

A time trial shows by how much this matters; using your sample d3 and 100k runs, we easily see a 5x speed difference:

>>> from timeit import timeit
>>> timeit('n(d)', 'from __main__ import create_nested_dict as n, d3; d=d3.items()', number=100_000)
8.210276518017054
>>> timeit('n(d)', 'from __main__ import nest as n, d3 as d', number=100_000)
1.6089267160277814

这篇关于元组中的任意嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆