如何按照它们的总和顺序遍历大量的整数元组? [英] How do I iterate over a large number of tuples of integers in the order of their sum?

查看:129
本文介绍了如何按照它们的总和顺序遍历大量的整数元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 itertools.combinations( ) 迭代整数元组。

I'm using itertools.combinations() to iterate over tuples of integers.

我对最低和的元组感兴趣>满足一些条件:

I am interested in the tuple with the lowest sum that satisfies some conditions:

def findLowestNiceTuple:
    for tup in itertools.combinations(range(1, 6), 2):
        if niceTuple(tup):
            return tup

生成器的默认顺序不是元素总和的顺序。例如:

The generator's default order is not in the order of the elements' sum. For example:

>>> itertools.combinations(range(1, 6), 2)

给出一个生成器,它将产生以下内容要素:

gives a generator which will yield the following elements:

[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]

如您所见,(1,5)的总和大于(2,3)的总和。对于提前终止,我需要 ...,(1,4),(2,3),(1,5),... 的顺序中的元组。

As you can see, the sum of (1, 5) is larger than that of (2,3). For early termination, I need the tuples in the order ..., (1, 4), (2, 3), (1, 5), ....

对于适度数量的组合,您可以使用 sorted()来解决这个问题:

For a modest number of combinations, you can get around this by using sorted():

>>> sorted(itertools.combinations(range(1, 6), 2), key=sum)
[(1, 2), (1, 3), (1, 4), (2, 3), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]

然而, sorted()将生成器转换为完全保存在内存中的列表。这意味着它不再能很好地扩展。像 itertools.combinations(范围(1,600),400)之类的东西将不可避免地产生 MemoryError

However, sorted() converts the generator to a list which is kept in memory entirely. This means that it no longer scales very well. Something like itertools.combinations(range(1, 600), 400) will inevitably produce a MemoryError.

是否有更友好的方式来达到预期效果?

PS:我确实意识到完全迭代我提到的最后一个序列需要很长时间,但我想要的元组应该非常接近开始。如果我可以指望订单,我可以像在第一个片段中那样提前终止。

PS: I do realize that it would take ages to fully iterate over the last sequence I mentioned, but the tuple I am looking for should be very close to the start. And if I can count on the order, I can terminate early as in the first snippet.

推荐答案

以下是我如何解决它,使用递归函数找到总和给定值的所有组合:

Here's how I'd solve it, with a recursive function that finds all combinations that sum to a given value:

def ordered_combinations(pop, n):
    pop = sorted(pop)

    for s in range(sum(pop[:n]), sum(pop[-n:])+1):
        yield from get_sums(pop, s, n)

def get_sums(pop, s, n):
    if n == 1:
        if s in pop:
            yield [s]
        return

    for i, v in enumerate(pop):
        if sum(pop[i:i+n]) > s:
            return
        for rest in get_sums(pop[i+1:], s-v, n-1):
            rest.append(v)
            yield rest

这是一个输出示例:

>>> for c in ordered_combinations(range(1, 8), 4):
    print(c, sum(c))


[4, 3, 2, 1] 10
[5, 3, 2, 1] 11
[6, 3, 2, 1] 12
[5, 4, 2, 1] 12
[7, 3, 2, 1] 13
[6, 4, 2, 1] 13
[5, 4, 3, 1] 13
[7, 4, 2, 1] 14
[6, 5, 2, 1] 14
[6, 4, 3, 1] 14
[5, 4, 3, 2] 14
[7, 5, 2, 1] 15
[7, 4, 3, 1] 15
[6, 5, 3, 1] 15
[6, 4, 3, 2] 15
[7, 6, 2, 1] 16
[7, 5, 3, 1] 16
[6, 5, 4, 1] 16
[7, 4, 3, 2] 16
[6, 5, 3, 2] 16
[7, 6, 3, 1] 17
[7, 5, 4, 1] 17
[7, 5, 3, 2] 17
[6, 5, 4, 2] 17
[7, 6, 4, 1] 18
[7, 6, 3, 2] 18
[7, 5, 4, 2] 18
[6, 5, 4, 3] 18
[7, 6, 5, 1] 19
[7, 6, 4, 2] 19
[7, 5, 4, 3] 19
[7, 6, 5, 2] 20
[7, 6, 4, 3] 20
[7, 6, 5, 3] 21
[7, 6, 5, 4] 22

这些组合总是首先产生最大值,作为我如何将它们构建为列表的工件(通过附加小值o最后,而不是通过连接到前面)。如果您希望它们从最小到最大排序,您可以更改 rest.append(v);产量休息行到产量[v] +休息

The combinations are always yielded with the biggest values first, as an artifact of how I'm building them as lists (by appending small values on the end, rather than by concatenating to the front). If you want them ordered from smallest to largest, you can change the rest.append(v); yield rest lines to yield [v]+rest.

代码使用从Python 3.3中引入的获得的语法。如果您使用的是不支持该版本的早期版本,则可以使用此等效代码:

The code uses the yield from syntax that was introduced with Python 3.3. If you're using an earlier version that doesn't support that, you can use this equivalent code:

for v in get_sums(pop, s, n):
    yield v

代码甚至可以处理极端你描述了从800个成员范围取得的400个组合的情况。这是该计算的前20个结果(仅显示其最大的10个值,因为其余的都是相同的390到1),以及它们的总和:

The code can even handle the extreme case you described of 400-combinations taken from an 800 member range. Here's the first twenty results of that computation (shown only with their largest 10 values, since the rest are all identically 390 down to 1), and their sums:

>>> for i, v in enumerate(ordered_combinations(range(1, 800), 400)):
    if i >= 20:
        break
    print(v[:10], sum(v))


[400, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80200
[401, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80201
[402, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[401, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[403, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[402, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[401, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80203
[404, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[403, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80204
[401, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80204
[405, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[404, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 401, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80205
[401, 400, 399, 398, 397, 395, 394, 393, 392, 391] 80205
[406, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80206

因为它是递归的,所以此代码可能会失败如果你请求一个1000组合(这是由于Python的默认递归限制)。如有必要,您可以使用 sys.setrecursionlimit 修改限制。

Because it's recursive, this code may fail if you request an 1000-combination (this is due to Python's default recursion limit). You can modify the limit it with sys.setrecursionlimit if necessary.

如果你去,它也可能有内存问题由于 get_sums 切片(并因此复制)递归步骤中的总体,因此人口极其庞大。如果您对此代码的使用仅使用 range s,则可以通过删除 pop = sorted(pop)来解决内存问题来自 ordered_combinations 的行,因为Python 3的范围对象可以有效切片(即范围(1,100)[10:] 范围(11,100))。

It may also have memory issues if you go exceedingly deep with an extremely large population, since get_sums slices (and so copies) the population in the recursive step. If your use for this code will only be using ranges, you can probably fix the memory issue by removing the pop = sorted(pop) line from ordered_combinations, since Python 3's range objects can be sliced efficiently (that is, range(1,100)[10:] is range(11,100)).

这篇关于如何按照它们的总和顺序遍历大量的整数元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆