随机采样固定数目为零的数组 [英] Randomly sample an array with a fixed number of zeros

查看:94
本文介绍了随机采样固定数目为零的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经优化了代码,可以随机采样一个包含-1s,0s和1s且概率为1 / 4、1 / 2、1 / 4的数组。看起来像是

I have optimized code to randomly sample an array containing -1s, 0s and 1s with probabilities 1/4,1/2,1/4. It looks like

#define n (12)
unsigned int x,y=34353,z=57768,w=1564; //PRNG seeds
/* xorshift PRNG
 * Taken from https://en.wikipedia.org/wiki/Xorshift#Example_implementation
 * Used under CC-By-SA */
u_int32_myRand() {
    unsigned int t;
    t = x ^ (x << 11);
    x = y; y = z; z = w;
    return w = w ^ (w >> 19) ^ t ^ (t >> 8);
}

x=(int)time(NULL); //seed PRNG
unsigned int k
int F[n];
for(k=0; k<n; k++) {
      F[k]=(1-(myRand()&3))%2;  
    }

我该如何修改它,使其只返回正好为n /

How can I modify this so that it only returns arrays that have exactly n/3 zeros in them and still have it fast?

推荐答案

分两个步骤进行:


  1. 在数组中随机分配 n / 3 个零,并将其余部分设置为 1

  2. 为其余的分配随机符号以获得所需的-1 / + 1。

  1. Distribute exactly n/3 zeros randomly over your array and set the rest to 1.
  2. Assign a random sign to the remaining ones to get the desired -1/+1.

示例代码:

int F[n];
// fill with 1
for(k=0; k<n; k++) {
    F[k] = 1;
}
// distribute n/3 zeros
for(k=0; k<n/3; k++) {
    // find a location which does not have a 0 yet
    int i;
    do {
        i = myRand() % n;
    } while(F[i] == 0);
    F[i] = 0;
}
// change remaining (non zero) to -1 with 50% probability
for(k=0; k<n; k++) {
    if(F[k] && myRand()%2) F[k] = -1;
}

运行时间约为2.4 n,但我认为您不能

This has a runtime of around 2.4 n, but I do not think you can get much faster than that.

在n / 3个零的情况下,第二个for循环中的while循环平均执行约1.2次。

The while loop in the second for loop is in average executed around 1.2 times for the case of n/3 zeros.

备注:

在如果成功概率足够高,第二个 for 循环就可以很好地工作。平均需要概率为p的试验次数为1 / p。

Trial and error used in the second for loop works quite well if the success probability is high enough. The number of trials you need in average for a probability of p is 1/p.

在我们的案例中(n / 3个零),找到一个好的位置的最差概率(即最后一个零)为2/3,因此平均进行1.5次迭代。要查找所有n / 3个零的位置,您平均需要进行0.2 * n次迭代。

In our case (n/3 zeroes) the worst probability to find a good location (i.e. for the last zero) is 2/3, thus in average 1.5 iterations. To find places for all n/3 zeros you need in average around 0.2*n iterations.

平均运行时间可以计算为 -log( 1-a),其中 a 是您要分配的零的百分比(在您的情况下, a = 1 / 3 )。

The average runtime can be computed as -log(1-a), where a is the percentage of zeroes you want to distribute (in your case a = 1/3).

更多示例:如果要分配2/3 * n个零,则需要1.1 * n次迭代。对于0.99 * n个零,它已经是4.6 * n个迭代。

Some more examples: If you would want to distribute 2/3*n zeros, it would take 1.1*n iterations. For 0.99*n zeroes it is already 4.6*n iterations.

所有平均值。在最坏的情况下,它会永远存在。

All in average. In the worst case it takes forever.

如果您需要运行时保证,则可以通过实施实际采样而不需要进行更好的保证重新选择,即用所有可能的索引填充一个容器,采样一个随机元素作为索引并将其从容器中删除。但这可能具有大约O(n * log(n))的运行时间。因此,它适用于较小的n或较大百分比的零。

If you need a runtime guarantee you are perhaps better off by implementing real sampling without reselection, i.e. filling a container with all possible indices, sampling a random element as index and removing it from the container. But this probably has a runtime of around O(n*log(n)). Thus it works good for small n or a large percentage of zeroes.

这篇关于随机采样固定数目为零的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆