Python:使用索引展平嵌套列表 [英] Python: flatten nested lists with indices

查看:135
本文介绍了Python:使用索引展平嵌套列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个任意大小的任意深度嵌套列表的列表,我想在树中的所有元素上有一个平坦的,深度优先的迭代器,但是路径指示也是这样的:

Given a list of arbitrairly deep nested lists of arbitrary size, I would like an flat, depth-first iterator over all elements in the tree, but with path indicies as well such that:

for x, y in flatten(L), x == L[y[0]][y[1]]...[y[-1]]. 

这是

L = [[[1, 2, 3], [4, 5]], [6], [7,[8,9]], 10]
flatten(L)

应该收益:

(1, (0, 0, 0)),
(2, (0, 0, 1)),
(3, (0, 0, 2)),
(4, (0, 1, 0)),
(5, (0, 1, 1)),
(6, (1, 0)),
(7, (2, 0)),
(8, (2, 1, 0)),
(9, (2, 1, 1)),
(10, (3,))

我使用带有 yield的生成器为此做了一个递归实现陈述:

def flatten(l):
    for i, e in enumerate(l):
        try:
            for x, y in flatten(e):
                yield x, (i,) + y
        except:
            yield e, (i,)

但我不认为这是一个好的或负责任的实施,有没有任何方法可以做到这一点更一般地说只使用内置函数或std lib这样的东西,比如 itertools

but I don't think its a good or responsible implementation, Is there any recipe for doing this more generally just using builtins or std lib stuff like itertools ?

推荐答案

我认为你自己的解决方案没问题,没有什么比这更简单,而且Python的标准库也无济于事。但这是另一种方式,它以迭代方式而不是递归方式工作,因此它可以处理非常深层次的嵌套列表。

I think your own solution is alright, that there's nothing really simpler, and that Python's standard library stuff wouldn't help. But here's another way anyway, which works iteratively instead of recursively so it can handle very deeply nested lists.

def flatten(l):
    stack = [enumerate(l)]
    path = [None]
    while stack:
        for path[-1], x in stack[-1]:
            if isinstance(x, list):
                stack.append(enumerate(x))
                path.append(None)
            else:
                yield x, tuple(path)
            break
        else:
            stack.pop()
            path.pop()

我将当前活动列表保存在枚举迭代器的堆栈上,并将当前索引路径保留为另一个堆栈。然后在while循环中,我总是尝试从当前列表中获取下一个元素并对其进行适当处理:

I keep the currently "active" lists on a stack of enumerate iterators, and the current index path as another stack. Then in a while-loop I always try to take the next element from the current list and handle it appropriately:


  • 如果下一个元素是一个列表,然后我在堆栈上推送它的枚举迭代器,并为索引路径堆栈上的更深层索引腾出空间。

  • If下一个元素是一个数字,然后我将它与它的路径一起产生。

  • 如果当前列表中没有下一个元素,那么我将它删除(或者更确切地说是它的迭代器)堆栈中的索引点。

  • If that next element is a list, then I push its enumerate iterator on the stack and make room for the deeper index on the index path stack.
  • If that next element is a number, then I yield it along with its path.
  • If there was no next element in the current list, then I remove it (or rather its iterator) and its index spot from the stacks.

演示:

>>> L = [[[1, 2, 3], [4, 5]], [6], [7,[8,9]], 10]
>>> for entry in flatten(L):
        print(entry)

(1, (0, 0, 0))
(2, (0, 0, 1))
(3, (0, 0, 2))
(4, (0, 1, 0))
(5, (0, 1, 1))
(6, (1, 0))
(7, (2, 0))
(8, (2, 1, 0))
(9, (2, 1, 1))
(10, (3,))

请注意,如果您处理条目苍蝇,就像打印一样,然后你可以像列表一样产生路径,即使用 yield x,path 。演示:

Note that if you process the entries on the fly, like printing does, then you could just yield the path as the list it is, i.e., use yield x, path. Demo:

>>> for entry in flatten(L):
        print(entry)

(1, [0, 0, 0])
(2, [0, 0, 1])
(3, [0, 0, 2])
(4, [0, 1, 0])
(5, [0, 1, 1])
(6, [1, 0])
(7, [2, 0])
(8, [2, 1, 0])
(9, [2, 1, 1])
(10, [3])

这样,迭代器只需要O( n)整个迭代的时间,其中n是结构中对象的总数(列表和数字)。当然,打印增加了复杂性,就像创建元组一样。但那就是在发电机之外和打印的故障或者你在每条路径上做的任何事情。例如,如果您只查看每个路径的长度而不是其内容,这需要O(1),那么整个事实上实际上是O(n)。

This way, the iterator only takes O(n) time for the whole iteration, where n is the total number of objects in the structure (both lists and numbers). Of course the printing increases the complexity, just like creating the tuples does. But that's then outside of the generator and the "fault" of the printing or whatever you're doing with each path. If you for example only look at each path's length instead of its contents, which takes O(1), then the whole thing even actually is O(n).

全部再说,我认为你自己的解决方案是好的。而且显然比这简单。就像我在@ naomik的答案中评论的那样,我认为你的解决方案无法处理大约1000或更多的深度列表是相当无关紧要。人们首先不应该有这样的清单。如果有的话,这是一个应该修复的错误。如果列表也可以 wide ,就像你的情况一样,并且是平衡的,那么即使分支因子只有2,你的内存也会在100以下的深度耗尽,你就不会'如果列表不能变宽,那么嵌套列表是错误的数据结构选择,加上你不会对索引路径感兴趣。如果它可以变宽但,那么我会说应该改进创建算法(例如,如果它代表一个排序树,添加平衡)。

All that said, again, I think your own solution is alright. And clearly simpler than this. And like I commented under @naomik's answer, I think your solution not being able to handle lists of depth around 1000 or more is rather irrelevant. One should not even have such a list in the first place. If one does, that's a mistake that should be fixed instead. If the list can also go wide, as in your case, and is balanced, then even with a branch factor of just 2 you'd run out of memory at a depth well under 100 and you wouldn't get anywhere near 1000. If the list can't go wide, then nested lists is the wrong data structure choice, plus you wouldn't be interested in the index path in the first place. If it can go wide but doesn't, then I'd say the creation algorithm should be improved (for example if it represents a sorted tree, add balancing).

再次关于我的解决方案:除了能够处理任意深度列表及其效率之外,我发现其中一些细节值得注意:

About my solution again: Besides its ability to handle arbitrarily deep lists and its efficiency, I find some of its details interesting to note:


  • 您很少看到枚举存储在某处的对象。通常它们只是直接在循环&Co中使用,例如代表i,x代表枚举(l):

  • 拥有路径[-1] 现场就绪并写入路径[-1],x in ...

  • 使用获取 -loop,并立即中断 else branch,迭代下一个单值,句柄优雅地结束,不用尝试 / 除外并且没有 next 并且有一些默认值。

  • 如果你做 yield x,path ,即不要将每个路径转换为元组,那么你真的需要在迭代过程中直接处理它。例如,如果你执行 list(flatten(L)),那么你得到 [(1,[]),(2,[]), (3,[]),(4,[]),(5,[]),(6,[]),(7,[]),(8,[]),(9,[]),( 10,[])] 。也就是说,所有索引路径将为空。当然那是因为实际上只有一个路径对象,我一遍又一遍地更新并屈服,最后它是空的。这与 itertools.groupby 非常相似,例如 [list(g)for _,g in list(groupby('aaabbbb')) ] 给你 [[],['b']] 。这不是一件坏事。我最近广泛地写了这篇文章

  • You rarely ever see enumerate objects being stored somewhere. Usually they're just used in loops&Co directly, like for i, x in enumerate(l):.
  • Having the path[-1] spot ready and writing into it with for path[-1], x in ....
  • Using a for-loop with an immediate break and an else branch, to iterate over the next single value and handle ends gracefully without try/except and without next and some default.
  • If you do yield x, path, i.e., don't turn each path into a tuple, then you really need to process it directly during the iteration. For example if you do list(flatten(L)), then you get [(1, []), (2, []), (3, []), (4, []), (5, []), (6, []), (7, []), (8, []), (9, []), (10, [])]. That is, "all" index paths will be empty. Of course that's because there really only is one path object which I update and yield over and over again, and in the end its empty. This is very similar to itertools.groupby, where for example [list(g) for _, g in list(groupby('aaabbbb'))] gives you [[], ['b']]. And it's not a bad thing. I recently wrote about that extensively.

较短的版本,其中一个堆栈同时包含两个索引,枚举对象:

Shorter version with one stack holding both indexes and enumerate objects alternatingly:

def flatten(l):
    stack = [None, enumerate(l)]
    while stack:
        for stack[-2], x in stack[-1]:
            if isinstance(x, list):
                stack += None, enumerate(x)
            else:
                yield x, stack[::2]
            break
        else:
            del stack[-2:]

这篇关于Python:使用索引展平嵌套列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆