最大限度地减少重新presentative整数错误的总和 [英] Minimize the sum of errors of representative integers

查看:187
本文介绍了最大限度地减少重新presentative整数错误的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定n之间的整数[0,10000]为D <子> 1 ,D 2 ...,D <子> N ,其中可能有重复,n可以是巨大的:

Given n integers between [0,10000] as D1,D2...,Dn, where there may be duplicates, and n can be huge:

我想找到k个不同的重presentative整数(例如K = 5)之间[0,10000]的R <子> 1 ,R 2 ,. ..,R <子> K ,因此,所有的再presentative整数误差之和最小化。

I want to find k distinct representative integers (for example k=5) between [0,10000] as R1,R2,...,Rk, so the sum of errors of all the representative integers is minimized.

再presentative整数的误差定义如下:

The error of a representative integer is defined below:

假设我们的K重presentative整数按升序排列为{红<子> 1 ,R 2 ...,R <子> K },R <子>的错误我 是:

Assuming we have k representative integers in ascending order as {R1,R2...,Rk}, the error of Ri is :

和我希望尽量减少k个重presentative整数误差之和:

and i want to minimize the sum of errors of the k representative integers:

这可怎么有效地完成?

EDIT1:第k重presentative整数的最小的必须是最小的数 {D <子> 1 ,D 2 ...,D <子> N }

The smallest of the k representative integers must be the smallest number in {D1,D2...,Dn}

EDIT2:第k重presentative整数的最大的一定是最大的数量{D <子> 1 ,D 2 。 ..,D <子> N }加1。例如,当数量最多{D <子> 1 ,D 2 ...,D <子> N }是9787那么R <子> K 是9788。

The largest of the k representative integers must be the largest number in {D1,D2...,Dn} plus 1. For example when the largest number in {D1,D2...,Dn} is 9787 then Rk is 9788.

EDIT3:的一个具体的例子是在这里:

A concrete example is here:

(D)= {} 1,3,3,7,8,14,14,14,30和如果k = 5和R被选为{1,6,10,17,31}然后误差的和是:

D={1,3,3,7,8,14,14,14,30} and if k=5 and R is picked as {1,6,10,17,31} then the sum of errors is :

总和=(1-1)+(3-1)* 2 +(7-6)+(8-6)+(14-10)* 3 +(30-17)= 32

sum of errors=(1-1)+(3-1)*2+(7-6)+(8-6)+(14-10)*3+(30-17)=32

这是因为1所述; = 1,3,3&10 6,第6版; = 7,8小于10,10其中; = 14,14,14&其中; 17,17其中; = 30℃,31

this is because 1<=1,3,3<6 , 6<=7,8<10, 10<=14,14,14<17, 17<=30<31

推荐答案

虽然来自社会的帮助下,你已经成功地说明你的 在一个形式问题,这是可以理解的数学,你还是做 不能提供足够的信息,使我(或其他人)给 你一个明确的答案(我会张贴这作为注释,但 由于某种原因,我没有看到添加注释选项可用于 我)。为了给一个很好的回答这个问题,我们需要知道 n和k与彼此和10000的相对尺寸(或 对D <子>我如果不是10000),并确定是否预期最大 你达到确切最低(即使这是关键 需要的时间进行计算的高昂量),或者如果一个 近似​​将是美好的,也(如果是的话,如何接近你 需要得到)。此外,为了知道在算法运行什么 最短的时间,我们需要有什么样的了解 硬件将运行该算法(即我们是否有M处理器 内核并行运行上和什么是相对米的大小为k)

Although with help from the community you have managed to state your problem in a form which is understandable mathematically, you still do not supply enough information to enable me (or anyone else) to give you a definitive answer (I would have posted this as a comment, but for some reason I don't see the "add comment" option is available to me). In order to give a good answer to this problem, we need to know the relative sizes of n and k versus each other and 10000 (or the expected maximum of the Di if it isn't 10000), and whether it is critical that you attain the exact minimum (even if this requires an exorbitant amount of time for the calculation) or if a close approximation would be OK, also (and if so, how close do you need to get). In addition, in order to know what algorithm runs in minimum time, we need to have an understanding about what kind of hardware is going to run the algorithm (i.e., do we have m CPU cores to run on in parallel and what is the size of m relative to k).

的另一个重要信息是这个问题是否会 解决只有一次,或者它必须解决很多次,但存在着 整数Ð<子>的分布之间有着某种联系我 从一个问题到下一个(例如,整数 ð<子>我都是随机样本从一个特定的,不变 概率分布,或者每个连续问题有作为 其输入不断增加的集合是集合中从previous 问题加上一个额外的取值的整数)。

Another important piece of information is whether this problem will be solved only once, or it must be solved many times but there exists some connection between the distributions of the integers Di from one problem to the next (e.g., the integers Di are all random samples from a particular, unchanging probability distribution, or perhaps each successive problem has as its input an ever-increasing set which is the set from the previous problem plus an extra s integers).

没有合理的算法您的问题应该及时运行哪个 取决于在某种程度上大于n的线性比,由于建立的直方图 n个整数的D- <子>我需要O(n)的时间,并回答 优化问题本身只依赖于的直方图 的整数,而不是他们的排序。没有算法可以在时间运行 小于为O(n)时,由于这是问题的输入的大小。

No reasonable algorithm for your problem should run in time which depends in a way greater than linear in n, since building a histogram of the n integers Di requires O(n) time, and the answer to the optimization problem itself is only dependent on the histogram of the integers and not on their ordering. No algorithm can run in time less than O(n), since that is the size of the input of the problem.

一个强力搜索所有的可能性需要,(假设 对D <子>的至少一个我是0和另一个是10000),用于 小K,说K&LT; 10,大约O(10000 K-2 )的时间,所以,如果 登录 10 (N)>> 4(K-2),这是优化算法(因为在 这种情况下,时间为蛮力搜索微不足道相比 时间读取输入)。这也是有趣地注意到,如果k是 非常接近10000,那么蛮力搜索只需要 O(10000 10002-K )(因为我们可以搜索,而不是在 整数这是的没有的作为再presentative整数)。

A brute force search over all of the possibilities requires, (assuming that at least one of the Di is 0 and another is 10000), for small k, say k < 10, approximately O(10000k-2) time, so if log10(n) >> 4(k-2), this is the optimal algorithm (since in this case the time for the brute force search is insignificant compared to the time to read the input). It is also interesting to note that if k is very close to 10000, then a brute force search only requires O(1000010002-k) (because we can search instead over the integers which are not used as representative integers).

如果你更多的信息更新该问题的定义,我 将试图修改我又回答。

If you update the definition of the problem with more information, I will attempt to edit my answer in turn.

这篇关于最大限度地减少重新presentative整数错误的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆