在Python中列出词典(2级深)的词典 [英] Flatten a dictionary of dictionaries (2 levels deep) of lists in Python

查看:143
本文介绍了在Python中列出词典(2级深)的词典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在我的Python脚本中,我有一个字典的列表字典。 (实际上它会变得更深一些,但是这个级别并没有涉及到这个问题)。我想将所有这些都放在一个长列表中,扔掉所有的字典键。



  {1:{'a':[1,2,3],'b' [0]},
2:{'c':[4,5,1],'d':[3,8]}}

to

  [1,2,3,0,4, 5,1,3,8] 

我可能设置一个map-reduce来迭代项目的外部字典,以从每个子字典构建子列表,然后将所有子列表连接在一起。



但是对于大型数据集来说,这似乎是低效的,因为中间数据结构子列表)将被丢弃。有一种方法可以一次性执行吗?



除此之外,我很乐意接受一个两级的实现,我的map-reduce是生锈!



更新:
对于有兴趣的人,下面是我最终使用的代码。



请注意,尽管我上面列出了一个列表作为输出,但我真正需要的是排序列表;即扁平化的输出可以是任何可以排序的迭代。

  def genSessions(d):
给定ipDict,返回一个提供所有会话的迭代器,
逐个转换为元组。
在d.itervalues()中的uaDict:
用于会话uaDict.itervalues():
会话中的会话
yield tuple(session)

...

 #将会话列表的列表标注为会话列表。 
#按开始时间排序该列表
sessionsByStartTime = sorted(genSessions(ipDict),key = operator.itemgetter(0))
#然后按照结束时间排序另一个副本。
sessionsByEndTime = sorted(sessionsByStartTime,key = operator.itemgetter(1))

再次感谢所有帮助的人。



[更新:使用operator.itemgetter()替换nthGetter(),感谢@intuited。]

解决方案

编辑:重新阅读原始问题和重做的答案,假设所有非词典都是要展平的列表。

如果您不确定字典有多远,您将需要使用递归函数。 @Arrieta已经发布 a函数递归地构建非字典值的列表。



这是一个在字典树中产生连续的非字典值的生成器:

  def flatten(d):
在d中递归平铺字典值

> >>>帽= {'cat':['images / cat-in-the-hat.png'],
...'fish':{'colors':{'red':[0xFF0000 ],'blue':[0x0000FF]},
...'numbers':{'one':[1],'two':[2]}},
...' ':{'egg':{'green':[0x00FF00]},
...'ham':['lean','medium','fat']}}
>> ;> set_of_values = set(flatten(hat))
>>>已排序(set_of_values)
[1,2,255,65280,16711680,'fat','ima ges / cat-in-the-hat.png','lean','medium']

try:
for d在d.itervalues()中:
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for d中的list_v
yield list_v

doctest将生成的迭代器传递给集合函数。这可能是你想要的,因为马尔泰利先生指出,字典的价值观没有固有的顺序,因此没有理由追踪他们发现的顺序。



您可能想要跟踪每个值的出现次数;如果您将迭代器传递给 set ,则此信息将丢失。如果要跟踪,只需将 flatten(hat)的结果传递给其他一些功能,而不是设置。在Python 2.7下,其他函数可以是 collections.Counter 。为了与较不发达的蟒蛇兼容,您可以编写自己的功能或(有一些效率损失)将排序 itertools.groupby


I'm trying to wrap my brain around this but it's not flexible enough.

In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.

Thus I want to transform

{1: {'a': [1, 2, 3], 'b': [0]},
 2: {'c': [4, 5, 1], 'd': [3, 8]}}

to

[1, 2, 3, 0, 4, 5, 1, 3, 8]

I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.

But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?

Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!

Update: For those who are interested, below is the code I ended up using.

Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.

def genSessions(d):
    """Given the ipDict, return an iterator that provides all the sessions,
    one by one, converted to tuples."""
    for uaDict in d.itervalues():
        for sessions in uaDict.itervalues():
            for session in sessions:
                yield tuple(session)

...

# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))

Thanks again to all who helped.

[Update: replaced nthGetter() with operator.itemgetter(), thanks to @intuited.]

解决方案

edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.

In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. @Arrieta has already posted a function that recursively builds a list of non-dictionary values.

This one is a generator that yields successive non-dictionary values in the dictionary tree:

def flatten(d):
    """Recursively flatten dictionary values in `d`.

    >>> hat = {'cat': ['images/cat-in-the-hat.png'],
    ...        'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
    ...                 'numbers': {'one': [1], 'two': [2]}},
    ...        'food': {'eggs': {'green': [0x00FF00]},
    ...                 'ham': ['lean', 'medium', 'fat']}}
    >>> set_of_values = set(flatten(hat))
    >>> sorted(set_of_values)
    [1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
    """
    try:
        for v in d.itervalues():
            for nested_v in flatten(v):
                yield nested_v
    except AttributeError:
        for list_v in d:
            yield list_v

The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.

You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.

这篇关于在Python中列出词典(2级深)的词典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆