用于子集总和等于或接近给定比率的随机分区的快速 Python 算法 [英] Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

查看:30
本文介绍了用于子集总和等于或接近给定比率的随机分区的快速 Python 算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是我之前问题的延伸:快速 python 算法,用于从子集总和等于比率的数字列表中查找所有可能的分区.我想划分一个数字列表,以便子集总和的比率等于给定值.不同之处在于我现在有一长串 200 个数字,因此枚举是不可行的.请注意,虽然列表中当然有相同的数字,但每个数字都是可区分的.

This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio . I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.

import random
lst = [random.randrange(10) for _ in range(200)]

在这种情况下,我想要一个函数来随机采样一定数量的分区,其子集总和等于或接近给定的比率.这意味着解决方案可能是次优的,但我需要算法足够快.我猜贪心算法会做.话虽如此,当然如果有一个相对较快的算法能够给出最优解就更好了.

In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.

例如,我想对 100 个分区进行采样,所有分区的子集总和比率为 4 : 3 : 3.允许重复分区,但对于这么长的列表应该不太可能.该函数应该像这样使用:

For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:

partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)

要测试解决方案,您可以执行以下操作:

To test the solution, you can do something like:

from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps) 
            for part in partitions for i, x in enumerate(part)])

有什么建议吗?

推荐答案

您可以使用贪婪启发式方法,从列表的 num_gen 随机排列中生成每个分区.每个随机排列被划分为 len(ratios) 连续子列表.分区子集是排列的子列表这一事实使得在子列表生成期间执行比率条件变得非常容易:一旦我们当前正在构建的子列表的总和达到比率之一,我们就完成"子列表,将其添加到分区并开始创建新的子列表.我们可以一次性完成整个排列,从而得到以下时间复杂度O(num_gen * len(lst))的算法.

You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).

M = 100

N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)

for _ in range(M):
    # get a new random permutation
    random.shuffle(lst)
    
    partition = []
    
    # starting index (in the permutation) of the current sublist
    lo = 0
    # permutation partial sum
    s = 0
    # index of sublist we are currently generating (i.e. what ratio we are on)
    j = 0
    # ratio partial sum
    rs = ratios[j]
    
    for i in range(N):
        s += lst[i]
        
        # if ratio of permutation partial sum exceeds ratio of ratio partial sum,
        # the current sublist is "complete"
        if s / S >= rs / R:
            partition.append(lst[lo:i + 1])
            # start creating new sublist from next element
            lo = i + 1
            j += 1
            if j == P:
                # done with partition
                # remaining elements will always all be zeroes 
                # (i.e. assert should never fail)
                assert all(x == 0 for x in lst[i+1:])
                partition[-1].extend(lst[i+1:])
                break
            rs += ratios[j]

请注意,可以重新设计外循环以无限循环,直到生成 num_gen 良好分区(而不是仅循环 num_gen 次)以获得更高的稳健性.如果好的分区数量足够多,则该算法有望在 O(M) 次迭代中产生 M 个好的分区(前提是 random.shuffle 足够随机)与相同大小的分区总数相比,分区并不太小,因此对于大多数输入它应该表现良好.对于像 [random.randrange(10) for _ in range(200)] 这样的(几乎)均匀随机列表,每次迭代都会产生一个带有 eps 的良好分区= 0.05 通过运行下面的示例可以明显看出.当然,算法的表现如何还取决于好"的定义——紧密度要求越严格(换句话说,epsilon 越小),找到一个好的分区所需的迭代次数就越多.这个实现可以在这里找到,并且适用于任何输入(假设 random.shuffle 最终产生输入列表的所有排列).

Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).

您可以找到代码的可运行版本(带有断言来测试分区的好"程度)此处.

You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.

这篇关于用于子集总和等于或接近给定比率的随机分区的快速 Python 算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆