如何以组长度和组内元素的所有可能组合将列表拆分为 n 组? [英] How to split a list into n groups in all possible combinations of group length and elements within group?

查看:40
本文介绍了如何以组长度和组内元素的所有可能组合将列表拆分为 n 组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以所有可能的组合将一个列表分成 n 个组(允许可变组长度).

I want to split a list into n groups in all possible combinations (allowing for variable group length).

比如说,我有以下清单:

Say, I have the following list:

lst=[1,2,3,4]

如果我指定 n=2,则列表可以分为 1 个元素 - 3 个元素或 2 个元素 - 2 个元素的组.在这两种拆分列表的方法中,每个列表中的元素都有独特的组合.

If I specify n=2, the list could be divided either into groups of 1 element-3 elements or 2 elements-2 elements. Within those two ways of splitting the list, there are unique combinations of which elements go in each list.

当 n=2 时,这些将是:

With n=2, these would be:

(1),(2,3,4)
(2),(1,3,4)
(3),(1,2,4)
(4),(1,2,3)
(1,2),(3,4)
(1,3),(2,4)
(1,4),(2,3)

当 n=1 时,这些将是:

With n=1 these would be:

(1,2,3,4)

当 n=3 时,这些将是:

And with n=3 these would be:

(1),(2),(3,4)
(1),(3),(2,4)
(1),(4),(2,3)
(2),(3),(1,4)
(2),(4),(1,3)
(3),(4),(1,2)

我对长度为 0 的组不感兴趣,组内的顺序无关紧要.

I am not interested in groups of length 0, and order within a group does not matter.

我发现了两个类似的问题,但它们并没有完全回答我的问题.

I found two similar questions, but they don't answer my question exactly.

这个问题拆分一个包含所有组合的列表,其中每个组的长度为 n(我发现@tokland 的答案特别有用).但是,我并不希望所有组的长度都相同.

This question splits a list into all combinations where each group is of length n (I found the answer by @tokland) to be particularly useful). However, I am not looking for all groups to be of the same length.

然后是this 问题获取拆分位置的唯一组合,以将列表拆分为 n 个组.但是,此处保留了列表顺序,并且未确定这些组中元素的唯一组合.

And then the first step of this question gets unique combinations of split locations to split a list into n groups. However, list order is preserved here and unique combinations of elements within these groups is not determined.

我正在寻找这两个问题的组合 - 一个列表被分成 n 个组,所有可能的组长度组合以及组内元素的组合.

I am looking for a combination of these two questions - a list is split into n groups in all possible combinations of group length as well as combinations of elements within a group.

推荐答案

我们可以使用 this answer 中的基本递归算法和修改它以生成特定长度的分区,而不必生成和过滤掉不需要的分区.

We can use the basic recursive algorithm from this answer and modify it to produce partitions of a particular length without having to generate and filter out unwanted partitions.

def sorted_k_partitions(seq, k):
    """Returns a list of all unique k-partitions of `seq`.

    Each partition is a list of parts, and each part is a tuple.

    The parts in each individual partition will be sorted in shortlex
    order (i.e., by length first, then lexicographically).

    The overall list of partitions will then be sorted by the length
    of their first part, the length of their second part, ...,
    the length of their last part, and then lexicographically.
    """
    n = len(seq)
    groups = []  # a list of lists, currently empty

    def generate_partitions(i):
        if i >= n:
            yield list(map(tuple, groups))
        else:
            if n - i > k - len(groups):
                for group in groups:
                    group.append(seq[i])
                    yield from generate_partitions(i + 1)
                    group.pop()

            if len(groups) < k:
                groups.append([seq[i]])
                yield from generate_partitions(i + 1)
                groups.pop()

    result = generate_partitions(0)

    # Sort the parts in each partition in shortlex order
    result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
    # Sort partitions by the length of each part, then lexicographically.
    result = sorted(result, key = lambda ps: (*map(len, ps), ps))

    return result

这里发生了很多事情,让我解释一下.

There's quite a lot going on here, so let me explain.

首先,我们从相同上述递归算法的程序性、自下而上(术语?)实现开始:>

First, we start with a procedural, bottom-up (teminology?) implementation of the same aforementioned recursive algorithm:

def partitions(seq):
    """-> a list of all unique partitions of `seq` in no particular order.

    Each partition is a list of parts, and each part is a tuple.
    """
    n = len(seq)
    groups = []  # a list of lists, currently empty

    def generate_partitions(i):
        if i >= n:
            yield list(map(tuple, groups))
        else:
            for group in groups
                group.append(seq[i])
                yield from generate_partitions(i + 1)
                group.pop()

            groups.append([seq[i]])
            yield from generate_partitions(i + 1)
            groups.pop()

    if n > 0:
        return list(generate_partitions(0))
    else:
        return [[()]]

主要算法在嵌套的 generate_partitions 函数中.基本上,它遍历序列,对于每个项目,它: 1)将项目放入工作集中的每个当前组(也称为部分)并递归;2) 将项目放在它自己的新组中.

The main algorithm is in the nested generate_partitions function. Basically, it walks through the sequence, and for each item, it: 1) puts the item into each of current groups (a.k.a parts) in the working set and recurses; 2) puts the item in its own, new group.

当我们到达序列的末尾 (i == n) 时,我们会生成我们一直在构建的工作集的(深层)副本.

When we reach the end of the sequence (i == n), we yield a (deep) copy of the working set that we've been building up.

现在,为了获得特定长度的分区,我们可以简单地过滤或分组我们正在寻找的结果并完成它,但这种方法执行了很多不必要的如果我们只想要某个长度 k 的分区,则工作(即递归调用).

Now, to get partitions of a particular length, we could simply filter or group the results for the ones we're looking for and be done with it, but this approach performs a lot of unnecessary work (i.e. recursive calls) if we just wanted partitions of some length k.

请注意,在上面的函数中,分区的长度(即组数)在以下情况下会增加:

Note that in the function above, the length of a partition (i.e. the # of groups) is increased whenever:

            # this adds a new group (or part) to the partition
            groups.append([seq[i]])
            yield from generate_partitions(i + 1)
            groups.pop()

...被执行.因此,我们通过简单地在该块上放置一个保护来限制分区的大小,如下所示:

...is executed. Thus, we limit the size of a partition by simply putting a guard on that block, like so:

def partitions(seq, k):
    ...

    def generate_partitions(i):
        ...

            # only add a new group if the total number would not exceed k
            if len(groups) < k:
                groups.append([seq[i]])
                yield from generate_partitions(i + 1)
                groups.pop()

将新参数和该行添加到 partitions 函数现在将使其仅生成长度高达 k的分区.这几乎就是我们想要的.问题是for 循环有时仍然会生成长度小于k 的分区.

Adding the new parameter and just that line to the partitions function will now cause it to only generate partitions of length up to k. This is almost what we want. The problem is that the for loop still sometimes generates partitions of length less than k.

为了修剪那些递归分支,我们只需要在可以确定序列中有足够的剩余元素将工作集扩展到总共 for 循环时,才需要执行 for 循环code>k 组.剩余元素的数量 - 或尚未放入组的元素 - 是 n - i (或 len(seq) - i).k - len(groups) 是我们需要添加的新组的数量,以生成有效的 k 分区.如果 n - i <= k - len(groups),那么我们不能通过将其添加到当前组之一来浪费项目——我们必须 创建一个新组.

In order to prune those recursive branches, we need to only execute the for loop when we can be sure that we have enough remaining elements in our sequence to expand the working set to a total of k groups. The number of remaining elements--or elements that haven't yet been placed into a group--is n - i (or len(seq) - i). And k - len(groups) is the number of new groups that we need to add to produce a valid k-partition. If n - i <= k - len(groups), then we cannot waste an item by adding it one of the current groups--we must create a new group.

所以我们只需添加另一个守卫,这次是到另一个递归分支:

So we simply add another guard, this time to the other recursive branch:

    def generate_partitions(i):
        ...

            # only add to current groups if the number of remaining items
            # exceeds the number of required new groups.
            if n - i > k - len(groups):
                for group in groups:
                    group.append(seq[i])
                    yield from generate_partitions(i + 1)
                    group.pop()

            # only add a new group if the total number would not exceed k
            if len(groups) < k:
                groups.append([seq[i]])
                yield from generate_partitions(i + 1)
                groups.pop()

有了它,你就有了一个可用的 k 分区生成器.您可能会进一步折叠一些递归调用(例如,如果还有 3 个剩余的项目,而我们还需要 3 个组,那么您已经知道必须将每个项目分成自己的组),但我想展示功能是对生成所有分区的基本算法的轻微修改.

And with that, you have a working k-partition generator. You could probably collapse some of the recursive calls even further (for example, if there are 3 remaining items and we need 3 more groups, then you already know that you must split each item into their own group), but I wanted to show the function as a slight modification of the basic algorithm which generates all partitions.

剩下要做的就是对结果进行排序.不幸的是,我没有弄清楚如何以所需的顺序直接生成分区(一个更聪明的狗的练习),而是作弊并在生成后进行排序.

The only thing left to do is sort the results. Unfortunately, rather than figuring out how to directly generate the partitions in the desired order (an exercise for a smarter dog), I cheated and just sorted post-generation.

def sorted_k_partitions(seq, k):
    ...
    result = generate_partitions(0)

    # Sort the parts in each partition in shortlex order
    result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
    # Sort partitions by the length of each part, then lexicographically.
    result = sorted(result, key = lambda ps: (*map(len, ps), ps))

    return result

有点不言自明,除了关键功能.第一个:

Somewhat self-explanatory, except for the key functions. The first one:

key = lambda p: (len(p), p) 

表示先按长度排序,然后按序列本身排序(在 Python 中,默认情况下按字典顺序排序).p 代表部分".这用于对分区内的部件/组进行排序.这个键意味着,例如,(4,)(1, 2, 3) 之前,所以 [(1, 2, 3), (4,)] 排序为 [(4,), (1, 2, 3)].

says to sort a sequence by length, then by the sequence itself (which, in Python, are ordered lexicographically by default). The p stands for "part". This is used to sort the parts/groups within a partition. This key means that, for example, (4,) precedes (1, 2, 3), so that [(1, 2, 3), (4,)] is sorted as [(4,), (1, 2, 3)].

key = lambda ps: (*map(len, ps), ps) 
# or for Python versions <3.5: lambda ps: tuple(map(len, ps)) + (ps,)

ps 在这里代表部分",复数.这个是说按每个元素的长度(必须是序列本身)对序列进行排序,然后(按字典顺序)按序列本身排序.这用于相对于彼此对分区进行排序,例如,[(4,), (1, 2, 3)][(1, 2) 之前, (3, 4)].

The ps here stands for "parts", plural. This one says to sort a sequence by the lengths of each of its elements (which must be sequence themselves), then (lexicographically) by the sequence itself. This is used to sort the partitions with respect to each other, so that, for example, [(4,), (1, 2, 3)] precedes [(1, 2), (3, 4)].

以下内容:

seq = [1, 2, 3, 4]

for k in 1, 2, 3, 4:
    for groups in sorted_k_partitions(seq, k):
        print(k, groups)

产生:

1 [(1, 2, 3, 4)]
2 [(1,), (2, 3, 4)]
2 [(2,), (1, 3, 4)]
2 [(3,), (1, 2, 4)]
2 [(4,), (1, 2, 3)]
2 [(1, 2), (3, 4)]
2 [(1, 3), (2, 4)]
2 [(1, 4), (2, 3)]
3 [(1,), (2,), (3, 4)]
3 [(1,), (3,), (2, 4)]
3 [(1,), (4,), (2, 3)]
3 [(2,), (3,), (1, 4)]
3 [(2,), (4,), (1, 3)]
3 [(3,), (4,), (1, 2)]
4 [(1,), (2,), (3,), (4,)]

这篇关于如何以组长度和组内元素的所有可能组合将列表拆分为 n 组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆