拼合列表的字典词典(深2级) [英] Flatten a dictionary of dictionaries (2 levels deep) of lists

查看：72 发布时间：2020/5/5 13:31:03 python data-structures mapreduce dictionary

本文介绍了拼合列表的字典词典(深2级)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正尽全力解决这个问题，但这还不够灵活.

在我的Python脚本中，我有一个字典列表字典. (实际上，它会更深入一点，但该级别不涉及此问题.)我想将所有这些内容整理成一个长长的列表，扔掉所有的字典键.

因此我要进行变换

{1: {'a': [1, 2, 3], 'b': [0]},
 2: {'c': [4, 5, 1], 'd': [3, 8]}}

到

[1, 2, 3, 0, 4, 5, 1, 3, 8]

我可能可以设置一个map-reduce来迭代外部词典的各项，以从每个子词典构建一个子列表，然后将所有子列表连接在一起.

但是对于大型数据集，这似乎效率不高，因为中间的数据结构(子列表)将被丢弃.有一种方法可以一次完成吗?

除非如此，否则我很乐意接受一个有效的两级实现...我的map-reduce生锈了！

更新: 对于那些感兴趣的人，下面是我最终使用的代码.

请注意，尽管我在上面要求输出一个列表，但我真正需要的是一个排序的列表.也就是说，拼合的输出可以是任何可排序的可迭代的.

def genSessions(d):
    """Given the ipDict, return an iterator that provides all the sessions,
    one by one, converted to tuples."""
    for uaDict in d.itervalues():
        for sessions in uaDict.itervalues():
            for session in sessions:
                yield tuple(session)

...

# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))

再次感谢所有提供帮助的人.

[更新:感谢@intuited，将nthGetter()替换为operator.itemgetter()."

解决方案

edit :重新阅读原始问题和修改后的答案，以假定所有非字典都是要扁平化的列表.

如果您不确定字典的使用范围，可以使用递归函数. @Arrieta已发布递归地构建非字典值列表的函数.

这是一个生成器，它在字典树中产生连续的非字典值:

def flatten(d):
    """Recursively flatten dictionary values in `d`.

    >>> hat = {'cat': ['images/cat-in-the-hat.png'],
    ...        'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
    ...                 'numbers': {'one': [1], 'two': [2]}},
    ...        'food': {'eggs': {'green': [0x00FF00]},
    ...                 'ham': ['lean', 'medium', 'fat']}}
    >>> set_of_values = set(flatten(hat))
    >>> sorted(set_of_values)
    [1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
    """
    try:
        for v in d.itervalues():
            for nested_v in flatten(v):
                yield nested_v
    except AttributeError:
        for list_v in d:
            yield list_v

doctest将结果迭代器传递给set函数.这很可能就是您想要的，因为正如Martelli先生所指出的那样，字典值没有内在顺序，因此没有理由跟踪它们的查找顺序.

您可能希望跟踪每个值的出现次数；如果将迭代器传递给set，则此信息将丢失.如果要跟踪它，只需将flatten(hat)的结果传递给其他函数而不是set.在Python 2.7下，该其他功能可以是collections.Counter.为了与发展较慢的python兼容，您可以编写自己的函数或(将效率降低一些)将sorted与itertools.groupby组合在一起.

I'm trying to wrap my brain around this but it's not flexible enough.

In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.

Thus I want to transform

{1: {'a': [1, 2, 3], 'b': [0]},
 2: {'c': [4, 5, 1], 'd': [3, 8]}}

[1, 2, 3, 0, 4, 5, 1, 3, 8]

I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.

But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?

Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!

Update: For those who are interested, below is the code I ended up using.

Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.

def genSessions(d):
    """Given the ipDict, return an iterator that provides all the sessions,
    one by one, converted to tuples."""
    for uaDict in d.itervalues():
        for sessions in uaDict.itervalues():
            for session in sessions:
                yield tuple(session)

...

# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))

Thanks again to all who helped.

[Update: replaced nthGetter() with operator.itemgetter(), thanks to @intuited.]

解决方案

edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.

In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. @Arrieta has already posted a function that recursively builds a list of non-dictionary values.

This one is a generator that yields successive non-dictionary values in the dictionary tree:

def flatten(d):
    """Recursively flatten dictionary values in `d`.

    >>> hat = {'cat': ['images/cat-in-the-hat.png'],
    ...        'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
    ...                 'numbers': {'one': [1], 'two': [2]}},
    ...        'food': {'eggs': {'green': [0x00FF00]},
    ...                 'ham': ['lean', 'medium', 'fat']}}
    >>> set_of_values = set(flatten(hat))
    >>> sorted(set_of_values)
    [1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
    """
    try:
        for v in d.itervalues():
            for nested_v in flatten(v):
                yield nested_v
    except AttributeError:
        for list_v in d:
            yield list_v

The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.

You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.

这篇关于拼合列表的字典词典(深2级)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

拼合列表的字典词典(深2级) [英] Flatten a dictionary of dictionaries (2 levels deep) of lists

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

拼合列表的字典词典(深2级) [英] Flatten a dictionary of dictionaries (2 levels deep) of lists

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭