随机采样固定数目为零的数组 [英] Randomly sample an array with a fixed number of zeros
问题描述
我已经优化了代码,可以随机采样一个包含-1s,0s和1s且概率为1 / 4、1 / 2、1 / 4的数组。看起来像是
I have optimized code to randomly sample an array containing -1s, 0s and 1s with probabilities 1/4,1/2,1/4. It looks like
#define n (12)
unsigned int x,y=34353,z=57768,w=1564; //PRNG seeds
/* xorshift PRNG
* Taken from https://en.wikipedia.org/wiki/Xorshift#Example_implementation
* Used under CC-By-SA */
u_int32_myRand() {
unsigned int t;
t = x ^ (x << 11);
x = y; y = z; z = w;
return w = w ^ (w >> 19) ^ t ^ (t >> 8);
}
x=(int)time(NULL); //seed PRNG
unsigned int k
int F[n];
for(k=0; k<n; k++) {
F[k]=(1-(myRand()&3))%2;
}
我该如何修改它,使其只返回正好为n /
How can I modify this so that it only returns arrays that have exactly n/3 zeros in them and still have it fast?
推荐答案
分两个步骤进行:
- 在数组中随机分配
n / 3
个零,并将其余部分设置为1
。 - 为其余的分配随机符号以获得所需的-1 / + 1。
- Distribute exactly
n/3
zeros randomly over your array and set the rest to1
. - Assign a random sign to the remaining ones to get the desired -1/+1.
示例代码:
int F[n];
// fill with 1
for(k=0; k<n; k++) {
F[k] = 1;
}
// distribute n/3 zeros
for(k=0; k<n/3; k++) {
// find a location which does not have a 0 yet
int i;
do {
i = myRand() % n;
} while(F[i] == 0);
F[i] = 0;
}
// change remaining (non zero) to -1 with 50% probability
for(k=0; k<n; k++) {
if(F[k] && myRand()%2) F[k] = -1;
}
运行时间约为2.4 n,但我认为您不能
This has a runtime of around 2.4 n, but I do not think you can get much faster than that.
在n / 3个零的情况下,第二个for循环中的while循环平均执行约1.2次。
The while loop in the second for loop is in average executed around 1.2 times for the case of n/3 zeros.
备注:
在如果成功概率足够高,第二个 for
循环就可以很好地工作。平均需要概率为p的试验次数为1 / p。
Trial and error used in the second for
loop works quite well if the success probability is high enough. The number of trials you need in average for a probability of p is 1/p.
在我们的案例中(n / 3个零),找到一个好的位置的最差概率(即最后一个零)为2/3,因此平均进行1.5次迭代。要查找所有n / 3个零的位置,您平均需要进行0.2 * n次迭代。
In our case (n/3 zeroes) the worst probability to find a good location (i.e. for the last zero) is 2/3, thus in average 1.5 iterations. To find places for all n/3 zeros you need in average around 0.2*n iterations.
平均运行时间可以计算为 -log( 1-a)
,其中 a
是您要分配的零的百分比(在您的情况下, a = 1 / 3
)。
The average runtime can be computed as -log(1-a)
, where a
is the percentage of zeroes you want to distribute (in your case a = 1/3
).
更多示例:如果要分配2/3 * n个零,则需要1.1 * n次迭代。对于0.99 * n个零,它已经是4.6 * n个迭代。
Some more examples: If you would want to distribute 2/3*n zeros, it would take 1.1*n iterations. For 0.99*n zeroes it is already 4.6*n iterations.
所有平均值。在最坏的情况下,它会永远存在。
All in average. In the worst case it takes forever.
如果您需要运行时保证,则可以通过实施实际采样而不需要进行更好的保证重新选择,即用所有可能的索引填充一个容器,采样一个随机元素作为索引并将其从容器中删除。但这可能具有大约O(n * log(n))的运行时间。因此,它适用于较小的n或较大百分比的零。
If you need a runtime guarantee you are perhaps better off by implementing real sampling without reselection, i.e. filling a container with all possible indices, sampling a random element as index and removing it from the container. But this probably has a runtime of around O(n*log(n)). Thus it works good for small n or a large percentage of zeroes.
这篇关于随机采样固定数目为零的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!