将数组拆分为 P 个平衡和的子数组的算法 [英] Algorithm to split an array into P subarrays of balanced sum

查看:26
本文介绍了将数组拆分为 P 个平衡和的子数组的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个长度为 N 的大数组,比如说:

2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1

我需要将这个数组拆分成 P 个子数组(在这个例子中,P=4 是合理的),这样每个子数组中元素的总和尽可能接近 sigma,正在:

sigma=(原始数组中所有元素的总和)/P

在本例中,sigma=15.

为了清楚起见,一种可能的结果是:

2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1(总和:12、19、14、15)

我写了一个非常幼稚的算法,基于我如何手动进行除法,但我不知道如何强加一个条件,即总和为 (14,14,14,14,19) 的除法是比 (15,14,16,14,16) 差.

提前致谢.

解决方案

首先,让我们通过为每个可能的解决方案指定输入、输出和度量来形式化您的优化问题(我希望这对您感兴趣):<块引用>

给定一个正整数数组A和一个正整数P,将数组A分成P非重叠子数组,使得每个子数组的总和与子数组的完美总和 (sum(A)/P) 之间的差异最小.

输入:正整数数组AP 是一个正整数.
输出:P个非负整数的数组SA,表示A的每个子数组的长度,其中这些子数组的长度等于A的长度.
测量:abs(sum(sa)-sum(A)/P) 对每个sa ∈ {sa |sa = (Ai, ..., Ai+‍SAj) for i = (Σ SAj), j 从 0 到 P-1}.

inputoutput 定义了一组有效的解决方案.measure 定义了一个度量来比较多个有效的解决方案.由于我们正在寻找与完美解决方案(最小化问题)差异最小的解决方案,测量也应该是最小的.

有了这些信息,很容易实现measure函数(这里是Python):

def measure(a, sa):sigma = sum(a)/len(sa)差异 = 0我 = 0对于 xrange(0, len(sa)) 中的 j:diff += abs(sum(a[i:i+sa[j]])-sigma)i += sa[j]返回差异打印测量([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], [3,4,4,5]) # 打印 8

现在找到最佳解决方案有点困难.

我们可以使用回溯算法来寻找有效的解决方案并使用度量 函数来评价它们.我们基本上尝试了 P 非负整数的所有可能组合,总和为 length(A) 来表示所有可能的有效解决方案.虽然这确保不会错过有效的解决方案,但它基本上是一种蛮力方法,其好处是我们可以省略一些不能比我们最好的解决方案更好的分支.例如.在上面的例子中,如果我们已经有一个 measure ≤ 38 的解决方案,我们就不需要用 [9,…] (measure > 38) 测试解决方案.

按照维基百科的伪代码模式,我们的 bt 函数如下所示:

def bt(c):全局 P,最优,最优差异如果拒绝(P,c):返回如果接受(P,c):打印 "%r with %d" % (c, measure(P,c))如果测量(P,c)<最佳差异:最优 = c最佳差异=测量(P,c)返回s = 第一(P,c)而 s 不是 None:bt(列表)s = 下一个(P,s)

全局变量Poptimumoptimum_diff 代表保存A 值的问题实例,Psigma,以及最优解及其测度:

 类 MinimalSumOfSubArraySumsProblem:def __init__(self, a, p):self.a = a自我.p = pself.sigma = sum(a)/p

接下来我们指定非常简单的 rejectaccept 函数:

def 拒绝(P,c):返回optimal_diff <测量(P,c)定义接受(P,c):返回 None 不在 c

这只是拒绝任何度量已经超过我们的最佳解决方案的候选者.我们接受任何有效的解决方案.

measure 函数也略有变化,因为 c 现在可以包含 None 值:

def measure(P, c):差异 = 0我 = 0对于 xrange(0, P.p) 中的 j:如果 c[j] 是 None:休息;diff += abs(sum(P.a[i:i+c[j]])-P.sigma)i += c[j]返回差异

剩下的两个函数firstnext稍微复杂一点:

def first(P,c):t = 0is_complete = 真对于 xrange(0, len(c)) 中的 i:如果 c[i] 是 None:如果 i+1 <连(c):c[i] = 0别的:c[i] = len(P.a) - tis_complete = 假休息;别的:t += c[i]如果 is_complete:返回无返回 c定义下一个(P,s):t = 0对于 xrange(0, len(s)) 中的 i:t += s[i]如果 i+1 >= len(s) 或 s[i+1] 为 None:如果 t+1 >len(P.a):返回无别的:s[i] += 1返回

基本上,first 要么用 0 替换列表中的下一个 None 值,如果它不是列表中的最后一个值,或者如果它是列表中的最后一个值,则表示有效解决方案的余数(此处略作优化),如果列表中没有 None 值,则返回 None.next 简单地将最右边的整数加 1,如果增量超过总限制,则返回 None.

现在你只需要创建一个问题实例,初始化全局变量并使用根调用bt:

P = MinimalSumOfSubArraySumsProblem([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], 4)最佳 = 无best_diff = float("inf")bt([无]*P.p)

I have an big array of length N, let's say something like:

2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1

I need to split this array into P subarrays (in this example, P=4 would be reasonable), such that the sum of the elements in each subarray is as close as possible to sigma, being:

sigma=(sum of all elements in original array)/P

In this example, sigma=15.

For the sake of clarity, one possible result would be:

2 4 6    7 6 3 3   3 4 3 4    4 4 3 3 1
(sums: 12,19,14,15)

I have written a very naive algorithm based in how I would do the divisions by hand, but I don't know how to impose the condition that a division whose sums are (14,14,14,14,19) is worse than one that is (15,14,16,14,16).

Thank you in advance.

解决方案

First, let’s formalize your optimization problem by specifying the input, output, and the measure for each possible solution (I hope this is in your interest):

Given an array A of positive integers and a positive integer P, separate the array A into P non-overlapping subarrays such that the difference between the sum of each subarray and the perfect sum of the subarrays (sum(A)/P) is minimal.

Input: Array A of positive integers; P is a positive integer.
Output: Array SA of P non-negative integers representing the length of each subarray of A where the sum of these subarray lengths is equal to the length of A.
Measure: abs(sum(sa)-sum(A)/P) is minimal for each sa ∈ {sa | sa = (Ai, …, Ai+‍SAj) for i = (Σ SAj), j from 0 to P-1}.

The input and output define the set of valid solutions. The measure defines a measure to compare multiple valid solutions. And since we’re looking for a solution with the least difference to the perfect solution (minimization problem), measure should also be minimal.

With this information, it is quite easy to implement the measure function (here in Python):

def measure(a, sa):
    sigma = sum(a)/len(sa)
    diff = 0
    i = 0
    for j in xrange(0, len(sa)):
        diff += abs(sum(a[i:i+sa[j]])-sigma)
        i += sa[j]
    return diff

print measure([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], [3,4,4,5]) # prints 8

Now finding an optimal solution is a little harder.

We can use the Backtracking algorithm for finding valid solutions and use the measure function to rate them. We basically try all possible combinations of P non-negative integer numbers that sum up to length(A) to represent all possible valid solutions. Although this ensures not to miss a valid solution, it is basically a brute-force approach with the benefit that we can omit some branches that cannot be any better than our yet best solution. E.g. in the example above, we wouldn’t need to test solutions with [9,…] (measure > 38) if we already have a solution with measure ≤ 38.

Following the pseudocode pattern from Wikipedia, our bt function looks as follows:

def bt(c):
    global P, optimum, optimum_diff
    if reject(P,c):
        return
    if accept(P,c):
        print "%r with %d" % (c, measure(P,c))
        if measure(P,c) < optimum_diff:
            optimum = c
            optimum_diff = measure(P,c)
        return
    s = first(P,c)
    while s is not None:
        bt(list(s))
        s = next(P,s)

The global variables P, optimum, and optimum_diff represent the problem instance holding the values for A, P, and sigma, as well as the optimal solution and its measure:

class MinimalSumOfSubArraySumsProblem:
    def __init__(self, a, p):
        self.a = a
        self.p = p
        self.sigma = sum(a)/p

Next we specify the reject and accept functions that are quite straight forward:

def reject(P,c):
    return optimum_diff < measure(P,c)
def accept(P,c):
    return None not in c

This simply rejects any candidate whose measure is already more than our yet optimal solution. And we’re accepting any valid solution.

The measure function is also slightly changed due to the fact that c can now contain None values:

def measure(P, c):
    diff = 0
    i = 0
    for j in xrange(0, P.p):
        if c[j] is None:
            break;
        diff += abs(sum(P.a[i:i+c[j]])-P.sigma)
        i += c[j]
    return diff

The remaining two function first and next are a little more complicated:

def first(P,c):
    t = 0
    is_complete = True
    for i in xrange(0, len(c)):
        if c[i] is None:
            if i+1 < len(c):
                c[i] = 0
            else:
                c[i] = len(P.a) - t
            is_complete = False
            break;
        else:
            t += c[i]
    if is_complete:
        return None
    return c

def next(P,s):
    t = 0
    for i in xrange(0, len(s)):
        t += s[i]
        if i+1 >= len(s) or s[i+1] is None:
            if t+1 > len(P.a):
                return None
            else:
                s[i] += 1
            return s

Basically, first either replaces the next None value in the list with either 0 if it’s not the last value in the list or with the remainder to represent a valid solution (little optimization here) if it’s the last value in the list, or it return None if there is no None value in the list. next simply increments the rightmost integer by one or returns None if an increment would breach the total limit.

Now all you need is to create a problem instance, initialize the global variables and call bt with the root:

P = MinimalSumOfSubArraySumsProblem([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], 4)
optimum = None
optimum_diff = float("inf")
bt([None]*P.p)

这篇关于将数组拆分为 P 个平衡和的子数组的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆