集合S的公平分割成k分区 [英] fair partitioning of set S into k partitions

查看:324
本文介绍了集合S的公平分割成k分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有包含各N个整数值为1℃的集合S = X&其中; = 10 ^ 6。问题是要集合S划分成k分区。一个分区的值是的元件present在它的总和。分区被以这样的方式组S的合计值相当分布在第k分区完成的。的数学含义的公平的也需要被确定(如目标可以是从集合S的平均值(即,总和最小化的分区的值的标准偏差(S) / K))

There is a set S containing N integers each with value 1<=X<=10^6. The problem is to partition the set S into k partitions. The value of a partition is the sum of the elements present in it. Partition is to be done in such a way the total value of the set S is fairly distributed amongst the k partitions. The mathematical meaning of fair also needs to be defined (e.g. the objective could be to minimize the standard deviation of the values of the partitions from the average value of the set S (which is, sum(S)/k))

例如。 S = {10,15,12,13,30,5},k = 3的

e.g. S = {10, 15, 12, 13, 30, 5}, k=3

一个很好的分区将是{30},{10,15},{12,13,5}

A good partitioning would be {30}, {10, 15}, {12, 13, 5}

一个坏的分区将是{30,5},{10,15},{12,13}

A bad partitioning would be {30, 5}, {10, 15}, {12, 13}

第一个问题是在数学上的前preSS条件对一个分区为比其它更好。 第二个问题是如何解决这个问题。这个问题是NP难。有什么启发?

First question is to mathematically express condition for one partition to be better than the other. Second question is to how to solve the problem. The problem is NP-Hard. Are there any heuristics?

在我试图解决N'其中的问题。=(K * LOGX)^ 2和K 2变化到7

In the problem that I am trying to solve N <= (k*logX)^2 and K varies from 2 to 7.

=============================================== ===================================

==================================================================================

,有两个合理的函数来评价一个分布:

Based on other related SO questions, there are two reasonable functions to evaluate a distribution:

一)最小化的最大值的分区的值

a) Minimize the value of the partition with the maximum value.

在第二个想法,这不是一个很好的指标。考虑,一组{100,40,40}以被划分成三个子集。这个指标并不以下两个分布区分,即使一个是显然优于其他。

On a second thought, this is not a good metric. Consider, a set {100, 40, 40} to be partitioned into three subsets. This metric does not distinguish between the following two distributions even though one is clearly better than the other.

分布1:{100},{40},{40}和分布2:{100},{40,40},{}

Distribution 1: {100}, {40}, {40} and Distribution 2: {100}, {40, 40}, {}

b)使任何两个值的给定分区中的差的最大值,即最小化最大| AB |任何A,B

b) Minimize the maximum of the difference of any two values in a given partition i.e minimize max|A-B| for any A, B

推荐答案

我觉得一个好的衡量标准将是:

I think a good metric will be:

let the result set be s1,s2,...,sk
let MAX be max{sum(si) for each i}
f({s1,...,sk}) = Sigma(MAX-sum(si)) for each i)

上攻:一个完善的销售将产生始终为0! 维基,缺点:如果没有可圈可点的解决方案,最好的结果也不会产生出0

the upside: a perfect distribution will yield always 0!
the downside: if there is none perferct solution, the best result will not yield 0.

一个贪婪的启发式对于这个问题将是:

a greedy heuristic for this problem will be:

sorted<-sort(S) (let's say sorted[0] is the highest)
s1=s2=...=sk= {}
for each x in sorted:
   s <- find_min() (*)
   s.add(x)

在这里find_min()产量S,从而使之和(S)LT; = SUM(SI)的每个SI

where find_min() yields s such that sum(s) <= sum(si) for each si.

这个解决方案的将产生F(衡量标准所定义的),使得 F(SOL)LT =(K-1)*最大{S} (从这里它是证明这必然):

this solution's will yield f (metrics defined above) such that f(sol) <= (k-1)*max{S} (from here it is a proof for this bound):


要求:对于每个子集S, MAX-SUM(S)&LT; =最大值{S}
证明 - 感应:在每一步的说法是真实的临时解决方案。 酒店在每一步,让MAX是最大{总和(SI)}在迭代开始(前加)!


claim: for each subset s, MAX- sum(s) <= max{S}
proof - by induction: at each step, the claim is true for the temporary solution.
in each step, let MAX be max{sum(si)} at the start of the iteration (before the add)!

base: the set of subsets at start is {},{},.. MAX=sum(si)=0 for each si. 
step: assume the assumption is true for iteration i, we'll show it is also true for iteration i+1:
let s be the set that x was added to, then MAX-sum(s) <= max{S} (induction assumption).
if sum(s) + x <= MAX: we are done, MAX was not changed.
else: we sorted the elements at start, so x <= max{S}, and thus if s was chosen
   (sum(si) >= sum(s) for each si) and sum(s) + x > MAX then: for each si, sum(si) + x >=
   sum(s) + x, so sum(s)+x - sum(si) <= x <= max{S}. since sum(s)+x will be the MAX next 
   iteration, we are done.

由于每一组 MAX-SUM(SI)&LT; =最大值{S} (显然,对于最大集, MAX-和(SI)= 0 ),在整体西格玛(MAX-SUM(SI))&LT; =(K-1)*最大{S} ,按照承诺。

because for each set MAX-sum(si) <= max{S} (and obviously, for the max set, MAX-sum(si)=0), at overall Sigma(MAX-sum(si)) <= (k-1)*max{S}, as promised.

编辑:
我有一些空闲时间,所以我编程由我和@Akhil提出两种启发式,和这两个指标,首先,两者的结果是决定性的(根据的魏氏的成对t检验),但是这是更好的是,你选择哪一个指标确定,奇怪的是,它试图最小化f显示算法() (@ Akhil`s)得分较低此同一F,但对于第二指标高。

EDIT :
I had some spare time, so I programmed both heuristics suggested by me and by @Akhil, and both metrics, first of all, both results are conclusive (according to Wilcoxon's pair-t test), but which is better is defined by which metric you choose, surprisingly, the algorithm which tried to minimize f() (@Akhil`s) scored lower for this same f, but higher for the second metric.

这篇关于集合S的公平分割成k分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆