将集合 S 公平划分为 k 个分区 [英] fair partitioning of set S into k partitions

查看:36
本文介绍了将集合 S 公平划分为 k 个分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个包含 N 个整数的集合 S,每个整数的值 1<=X<=10^6.问题是将集合 S 划分为 k 个分区.分区的值是其中存在的元素的总和.分区是以这样的方式完成的,集合 S 的总值在 k 个分区中公平分布.fair 的数学含义也需要定义(例如,目标可以是最小化分区值与集合 S 的平均值的标准偏差(即 sum(S)/k))

There is a set S containing N integers each with value 1<=X<=10^6. The problem is to partition the set S into k partitions. The value of a partition is the sum of the elements present in it. Partition is to be done in such a way the total value of the set S is fairly distributed amongst the k partitions. The mathematical meaning of fair also needs to be defined (e.g. the objective could be to minimize the standard deviation of the values of the partitions from the average value of the set S (which is, sum(S)/k))

例如S = {10, 15, 12, 13, 30, 5}, k=3

e.g. S = {10, 15, 12, 13, 30, 5}, k=3

一个好的分区应该是 {30}, {10, 15}, {12, 13, 5}

A good partitioning would be {30}, {10, 15}, {12, 13, 5}

一个坏的分区是 {30, 5}, {10, 15}, {12, 13}

A bad partitioning would be {30, 5}, {10, 15}, {12, 13}

第一个问题是用数学方法表达一个分区优于另一个分区的条件.第二个问题是如何解决问题.问题是NP-Hard.有什么启发式方法吗?

First question is to mathematically express condition for one partition to be better than the other. Second question is to how to solve the problem. The problem is NP-Hard. Are there any heuristics?

在我试图解决的问题中,N <= (k*logX)^2 和 K 在 2 到 7 之间变化.

In the problem that I am trying to solve N <= (k*logX)^2 and K varies from 2 to 7.

====================================================================================

==================================================================================

基于其他相关的 SO 问题,评估分布有两个合理的函数:

Based on other related SO questions, there are two reasonable functions to evaluate a distribution:

a) 用最大值最小化分区的值.

a) Minimize the value of the partition with the maximum value.

再想一想,这不是一个好的指标.考虑,一组 {100, 40, 40} 被划分为三个子集.该指标不区分以下两种分布,即使其中一种明显优于另一种.

On a second thought, this is not a good metric. Consider, a set {100, 40, 40} to be partitioned into three subsets. This metric does not distinguish between the following two distributions even though one is clearly better than the other.

分布 1:{100}、{40}、{40} 和分布 2:{100}、{40、40}、{}

Distribution 1: {100}, {40}, {40} and Distribution 2: {100}, {40, 40}, {}

b) 最小化给定分区中任意两个值之差的最大值,即最小化 max|A-B|对于任何 A,B

b) Minimize the maximum of the difference of any two values in a given partition i.e minimize max|A-B| for any A, B

推荐答案

我认为一个好的指标是:

I think a good metric will be:

let the result set be s1,s2,...,sk
let MAX be max{sum(si) for each i}
f({s1,...,sk}) = Sigma(MAX-sum(si)) for each i)

好处:完美的分布将始终产生 0!
缺点:如果没有完美的解决方案,最好的结果不会产生 0.

the upside: a perfect distribution will yield always 0!
the downside: if there is none perferct solution, the best result will not yield 0.

这个问题的贪婪启发式将是:

a greedy heuristic for this problem will be:

sorted<-sort(S) (let's say sorted[0] is the highest)
s1=s2=...=sk= {}
for each x in sorted:
   s <- find_min() (*)
   s.add(x)

其中 find_min() 产生 s,使得每个 si 的 sum(s) <= sum(si).

where find_min() yields s such that sum(s) <= sum(si) for each si.

这个解决方案将产生 f(上面定义的度量)使得 f(sol) <= (k-1)*max{S} (从这里它是这个界限的证明):

this solution's will yield f (metrics defined above) such that f(sol) <= (k-1)*max{S} (from here it is a proof for this bound):


声明:对于每个子集 s,MAX- sum(s) <= max{S}
证明 - 通过归纳:在每个步骤中,临时解决方案的声明都是正确的.
在每一步中,让 MAX 在迭代开始时(加法之前)为 max{sum(si)}!


claim: for each subset s, MAX- sum(s) <= max{S}
proof - by induction: at each step, the claim is true for the temporary solution.
in each step, let MAX be max{sum(si)} at the start of the iteration (before the add)!

base: the set of subsets at start is {},{},.. MAX=sum(si)=0 for each si. 
step: assume the assumption is true for iteration i, we'll show it is also true for iteration i+1:
let s be the set that x was added to, then MAX-sum(s) <= max{S} (induction assumption).
if sum(s) + x <= MAX: we are done, MAX was not changed.
else: we sorted the elements at start, so x <= max{S}, and thus if s was chosen
   (sum(si) >= sum(s) for each si) and sum(s) + x > MAX then: for each si, sum(si) + x >=
   sum(s) + x, so sum(s)+x - sum(si) <= x <= max{S}. since sum(s)+x will be the MAX next 
   iteration, we are done.

因为对于每个集合 MAX-sum(si) <= max{S} (显然,对于最大集合,MAX-sum(si)=0),总体 Sigma(MAX-sum(si)) <= (k-1)*max{S},如承诺的那样.

because for each set MAX-sum(si) <= max{S} (and obviously, for the max set, MAX-sum(si)=0), at overall Sigma(MAX-sum(si)) <= (k-1)*max{S}, as promised.


我有一些空闲时间,所以我编写了我和@Akhil 建议的启发式算法,这两个指标首先,两个结果都是决定性的(根据 Wilcoxon 的 pair-t 测试),但更好的是由你选择的度量来定义,令人惊讶的是,试图最小化 f() (@Akhil`s) 在相同的 f 中得分较低,但在第二个指标中得分较高.

EDIT :
I had some spare time, so I programmed both heuristics suggested by me and by @Akhil, and both metrics, first of all, both results are conclusive (according to Wilcoxon's pair-t test), but which is better is defined by which metric you choose, surprisingly, the algorithm which tried to minimize f() (@Akhil`s) scored lower for this same f, but higher for the second metric.

这篇关于将集合 S 公平划分为 k 个分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆