生成的范围内,N个随机数以恒定总和 [英] Generate N random numbers within a range with a constant sum

查看:274
本文介绍了生成的范围内,N个随机数以恒定总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要生成从specif分布中抽取N个随机数(如均匀随机)之间[A,B],它总和为恒定C.我已经尝试了几个解决方案,我能想到的自己,有的提出了类似的主题,但其中大多数要么工作在有限的问题,形式我是无法证明的结果依然沿用了所需的分布。

I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could think of myself, and some proposed on similar threads but most of them either work for a limited form of problem or I can't prove the outcome still follows the desired distribution.

我曾尝试: Generage N个随机数,除以所有这些由它们的总和,并通过所需的常数相乘。这似乎是工作,但结果不遵循规则的数字应该在[A:B]。

What I have tried: Generage N random numbers, divide all of them by the sum of them and multiply by the desired constant. This seems to work but the result does not follow the rule that the numbers should be within [a:b].

Generage N-1的随机数加0,并期望常数C,并对其进行排序。然后计算每两个连续nubmers与差异之间的区别是结果。这又总结到C,但有最后的方法同样的问题(该范围可以大于[A:B]。

Generage N-1 random numbers add 0 and desired constant C and sort them. Then calculate the difference between each two consecutive nubmers and the differences are the result. This again sums to C but have the same problem of last method(the range can be bigger than [a:b].

我也试过来产生随机数,并始终保持在某种程度上轨道的最小和最大的期望的总和,范围会保持不变,想出这个code:

I also tried to generate random numbers and always keep track of min and max in a way that the desired sum and range are kept and come up with this code:

bool generate(function<int(int,int)> randomGenerator,int min,int max,int len,int sum,std::vector<int> &output){
    /**
    * Not possible to produce such a sequence
    */
if(min*len > sum)
    return false;
if(max*len < sum)
    return false;

int curSum = 0;
int left = sum - curSum;
int leftIndexes = len-1;
int curMax = left - leftIndexes*min;
int curMin = left - leftIndexes*max;

for(int i=0;i<len;i++){
    int num = randomGenerator((curMin< min)?min:curMin,(curMax>max)?max:curMax);
    output.push_back(num);
    curSum += num;
    left = sum - curSum;
    leftIndexes--;
    curMax = left - leftIndexes*min;
    curMin = left - leftIndexes*max;
}

return true;
}

这似乎工作,但结果有时也很偏,我不认为它是继原始分布(如均匀)。例如:

This seems to work but the results are sometimes very skewed and I don't think it's following the original distribution (e.g. uniform). E.g:

//10 numbers within [1:10] which sum to 50:
generate(uniform,1,10,10,50,output);
//result:
2,7,2,5,2,10,5,8,4,5 => sum=50
//This looks reasonable for uniform, but let's change to 
//10 numbers within [1:25] which sum to 50:
generate(uniform,1,25,10,50,output);
//result:
24,12,6,2,1,1,1,1,1,1 => sum= 50

注意有多少的人在输出存在。这听起来合情合理的,因为范围较大。但他们真的不喜欢看的均匀分布。 我不知道,即使有可能实现我想要的,也许约束使得这个问题不解决的。

Notice how many ones exist in the output. This might sound reasonable because the range is larger. But they really don't look like a uniform distribution. I am not sure even if it is possible to achieve what I want, maybe the constraints are making the problem not solvable.

推荐答案

在的情况下所需的示范遵循均匀分布,该问题简化为产生N个随机数与总和= 1。这,反过来,是一种特殊的情况下狄利克雷分布的,但也可以更容易地计算使用指数分布。方法如下:

In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:

  1. 以一个统一的试样V <子> 1 ... v <子> N 用0和1之间所有V <子>我。
  2. 对于所有的i,1&LT; = I&LT; = N,定义U <子>我:= -ln v <子>我(注意,U <子>我 > 0)。
  3. 规范化U <子>我为p <子>我:= U <子>我 / s其中s的总和U <子> 1 + ... + U <子> N 。
  1. Take a uniform sample v1 … vN with all vi between 0 and 1.
  2. For all i, 1<=i<=N, define ui := -ln vi (notice that ui > 0).
  3. Normalize the ui as pi := ui/s where s is the sum u1+...+uN.

在P <子> 1 .. P <子> N 均匀分布(在昏暗的N-1的单工)和它们的总和为1。

The p1..pN are uniformly distributed (in the simplex of dim N-1) and their sum is 1.

您现在可以乘这些P <子>我以你想要的常数C,并通过总结一些其他的常像这样的翻译它们

You can now multiply these pi by the constant C you want and translate them by summing some other constant A like this

问:<子>我:= A + P <子>我 * C

qi := A + pi*C.

修改3

为了解决在意见中提出的一些问题,让我补充如下:

In order to address some issues raised in the comments, let me add the following:

  • 要确保最后的随机序列落在区间[a,b]上选择常数A和C以上为A:= A和C:= BA,即取q <子>我 = A + P <子>我 *(BA)。因为p <子>我的范围为(0,1),所有的Q <子>我将在区间[A,B]。
  • 在一个不能走(负)对数-ln(V <子>我)当v <子>我恰好是0,因为LN()不为0的概率定义这样的事件的极低。但是,为了确保没有任何错误信号诉代<子> 1 ... v <子> N 在上述必备威胁1项以特殊的方式的任何事件的0:考虑-ln(0)为正无穷大(记住:LN(X) - >负无穷大,当X-> 0)。因此,总和S = +无穷大,这意思是p <子>我 = 1和所有其他的p <子>Ĵ = 0如果没有此约定的序列(0 ... 1 .. .0)将永远不会发生(非常感谢@Severin Pappadeux这个有趣的话。)
  • 如在附连到问题的第四评论解释的由@Neil斯莱特它在逻辑上是不可能实现的原始帧的所有要求。因此,任何解决方案必须放宽制约的原有的一个子集。通过@Behrooz其他评论似乎证实,这将足以在这种情况下。
  • To ensure that the final random sequence falls in the interval [a,b] choose the constants A and C above as A := a and C := b-a, i.e., take qi = a + pi*(b-a). Since pi is in the range (0,1) all qi will be in the range [a,b].
  • One cannot take the (negative) logarithm -ln(vi) if vi happens to be 0 because ln() is not defined at 0. The probability of such an event is extremely low. However, in order to ensure that no error is signaled the generation of v1 ... vN in item 1 above must threat any occurrence of 0 in a special way: consider -ln(0) as +infinity (remember: ln(x) -> -infinity when x->0). Thus the sum s = +infinity, which means that pi = 1 and all other pj = 0. Without this convention the sequence (0...1...0) would never be generated (many thanks to @Severin Pappadeux for this interesting remark.)
  • As explained in the 4th comment attached to the question by @Neil Slater it is logically impossible to fulfill all the requirements of the original framing. Therefore any solution must relax the constraints to a proper subset of the original ones. Other comments by @Behrooz seem to confirm that this would suffice in this case.

编辑2

还有一个问题一直在评论中有人提出:

One more issue has been raised in the comments:

为什么要重新调整一个统一的样本不足以?

在换句话说,为什么要我懒得取负对数?

的原因是,如果我们只重新缩放然后将所得的样品不会均匀地分布在整个段(0,1)(或[A,B]为最终样品。)

The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)

要想像这让我们觉得2D,即,让我们考虑的情况下,N = 2。一个统一的样本(V <子> 1 ,V 2 )对应一个随机点与原点(0,0)和角(1,1)的平方。现在,当我们归这么点它的合计值S = V <子>除以1 + V 2 我们正在做的是投影点到对角什么的图片所示(请记住,对角线是行X + Y = 1):

To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v1,v2) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v1+v2 what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):

但考虑到绿线,它们是更接近主对角线从(0,0)至(1,1),比橙色的,它们是更接近轴x和y更长,突起往往积聚围绕投影线(蓝色),其中所述缩放的样本住的中心更。这表明,一个简单的缩放不会产生上所描绘的对角线的均匀样品。另一方面,它可以在数学证明,负对数不产生所需的均匀性。而不是copypasting一个数学证明的话,我会邀请大家来实现两种算法,并检查所产生的曲线表现为这个答案描述。

But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.

注意: 这里是一个博客文章在与应用程序进行石油和放这个有趣的话题;天然气行业)

(Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)

这篇关于生成的范围内,N个随机数以恒定总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆