以随机顺序迭代具有相同数量的一(或零)的二进制数 [英] Iterate binary numbers with the same quantity of ones (or zeros) in random order

查看:63
本文介绍了以随机顺序迭代具有相同数量的一(或零)的二进制数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要以随机顺序生成具有相同数量的1(或0)的二进制数.
有人知道固定长度二进制数的任何有效算法吗? 2位和4位数字的示例(为更清楚起见):

1100
1010
1001
0110
0101
0011

更新 无重复的随机顺序很重要.需要二进制数的序列,而不是单个排列.

解决方案

如果您有足够的内存来存储所有可能的位序列,并且您不介意在获得第一个结果之前就全部生成它们,那么解决方案将是使用某种有效的生成器将所有可能的序列生成向量,然后使用Fisher-Yates混洗对向量进行混洗.这很容易且没有偏见(只要您使用良好的随机数生成器进行随机播放),但是如果n很大,它可能会占用大量内存,尤其是在您不确定是否需要完成迭代的情况下. /p>

但是有两种解决方案不需要将所有可能的单词保留在内存中. (两种解决方案的C实现均紧随其后.)

1.位洗牌枚举

最快的(我认为)是先生成一个随机的位值洗牌,然后一次遍历所有可能的单词,然后将洗牌应用于每个值的位.为了避免改组实际位的复杂性,可以按照格雷码顺序生成字,其中只有两个位的位置从一个字变为下一个字. (这也称为旋转门"迭代,因为在添加每个新的1时,必须删除其他一些1.)这允许快速更新位掩码,但是这意味着要连续输入高度相关,可能不适用于某些目的.同样,对于n的较小值,可能的位混洗的数量非常有限,因此不会产生很多不同的序列. (例如,对于n为4且k为2的情况,有6个可能的单词可以用6!(720)种不同的方式进行排序,但是只有4!(24)个位洗牌.可以通过在序列中的任意位置开始迭代来稍微改善这一点.)

总是可以找到格雷码.这是一个n = 6,k = 3的示例:(每一步都交换粗体.我想在其下划线,但出于某些无法解释的原因,SO允许删除线但不能下划线.)

111000   010110   100011   010101
101100   001110   010011   001101
011100   101010   001011   101001
110100   011010   000111   011001
100110   110010   100101   110001

此序列可以通过类似于@JasonBoubin建议的建议的递归算法产生 –唯一的区别是每个递归的后半部分需要以相反的顺序生成-但是使用非递归版本的算法很方便.下面的示例代码中的一个来自弗兰克·鲁斯基(Frank Ruskey)尚未发表的关于组合生成的手稿(算法第5.7页,第130页).我对其进行了修改,以使用基于0的索引,并添加了用于跟踪二进制表示形式的代码.

2.随机生成一个整数序列并将其转换为组合

更多"随机但稍微慢一些的解决方案是生成一个经过随机排列的枚举索引列表(在[0, n choose k)中是连续整数),然后找到与每个索引相对应的单词.

生成连续范围内的整数的随机排列的列表的最简单的伪随机方法是使用随机选择的线性同余生成器(LCG). LCG是递归序列xi = (a * xi-1 + c) mod m.如果m是2的幂,则a mod 4是1且c mod 2是1,则该递归将循环遍历所有2 m 个可能的值.要遍历范围[0, n choose k),我们只需选择m作为下一个更大的2的幂,然后跳过任何不在所需范围内的值. (出于明显的原因,该值将小于所产生值的一半.)

为了将枚举索引转换为实际单词,我们基于以下事实执行索引的二项式分解:n choose k单词集由以0开头的n-1 choose k单词和以0开头的n-1 choose k-1单词组成a 1.因此,要产生第i个 单词:

  • 如果i < n-1 choose k我们输出一个0,然后输出具有k个位的n-1个位字的集合中的第i个 th 字;
  • 否则,我们输出1,然后从i中减去n-1 choose k作为索引,将其设置为设置了k-1个位的n-1个位字的集合.

预先计算所有有用的二项式系数很方便.

LCG的缺点是,在看到前几个术语后,它们很容易预测.同样,ac的一些随机选择的值将产生索引序列,其中连续索引高度相关. (此外,低阶位始终是非随机的.)通过对最终结果应用随机位改组,可以稍微缓解其中的一些问题.下面的代码中没有对此进行说明,但是它将使速度降低很少,并且应该很明显地做到这一点. (基本上,这是通过对混洗后的位进行表查找来替换1UL<<n.)

下面的C代码使用了一些优化,使其难以阅读.二项式系数存储在下对角线数组中:

  row
index
 [ 0]   1
 [ 1]   1 1
 [ 3]   1 2 1
 [ 6]   1 3 3 1
 [10]   1 4 6 4 1

可以看出,binom(n, k)的数组索引为n(n+1)/2 + k,如果有该索引,则可以通过简单地减去n来找到binom(n-1, k),通过减去n+1可以找到binom(n-1, k-1).为了避免需要在数组中存储零,我们确保在k为负或大于n的情况下,切勿查找二项式系数.特别是,如果我们到达了递归k == nk == 0的某个点,我们肯定可以知道要查找的索引是0,因为只有一个可能的单词.此外,在具有某些nk的单词集中索引0 将精确地由n-k个零组成,后跟k个零,这是2 k -1的n位二进制表示.通过在索引达到0时简化算法,我们可以避免担心binom(n-1, k)binom(n-1, k-1)之一不是有效索引的情况.

两种解决方案的C代码

带有乱码的灰色代码

 void gray_combs(int n, int k) {
  /* bit[i] is the ith shuffled bit */
  uint32_t bit[n+1];
  {
    uint32_t mask = 1;
    for (int i = 0; i < n; ++i, mask <<= 1)
      bit[i] = mask;
    bit[n] = 0;
    shuffle(bit, n);
  }

  /* comb[i] for 0 <= i < k is the index of the ith bit
   * in the current combination. comb[k] is a sentinel. */
  int comb[k + 1];
  for (int i = 0; i < k; ++i) comb[i] = i;
  comb[k] = n;

  /* Initial word has the first k (shuffled) bits set */
  uint32_t word = 0;
  for (int i = 0; i < k; ++i) word |= bit[i];

  /* Now iterate over all combinations */
  int j = k - 1; /* See Ruskey for meaning of j */
  do {
    handle(word, n);
    if (j < 0) {
      word ^= bit[comb[0]] | bit[comb[0] - 1];
      if (--comb[0] == 0) j += 2;
    }
    else if (comb[j + 1] == comb[j] + 1) {
      word ^= bit[comb[j + 1]] | bit[j];
      comb[j + 1] = comb[j]; comb[j] = j;
      if (comb[j + 1] == comb[j] + 1) j += 2;
    }
    else if (j > 0) {
      word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
      comb[j - 1] = comb[j]; ++comb[j];
      j -= 2;
    }
    else {
      word ^= bit[comb[j]] | bit[comb[j] + 1];
      ++comb[j];
    }
  } while (comb[k] == n);
}
 

具有枚举索引到单词转换的LCG

 static const uint32_t* binom(unsigned n, unsigned k) {
  static const uint32_t b[] = {
    1,
    1, 1,
    1, 2, 1,
    1, 3, 3, 1,
    1, 4, 6, 4, 1,
    1, 5, 10, 10, 5, 1,
    1, 6, 15, 20, 15, 6, 1,
    // ... elided for space
  };
  return &b[n * (n + 1) / 2 + k];
}

static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
  uint32_t rv = 0;
  while (r) {
    do {
      b -= n;
      --n;
    } while (r < *b);
    r -= *b;
    --b;
    --k;
    rv |= 1UL << n;
  }
  return rv + (1UL << k) - 1;
}

static bool lcg_combs(unsigned n, unsigned k) {
  const uint32_t* b = binom(n, k);
  uint32_t count = *b;
  uint32_t m = 1; while (m < count) m <<= 1;
  uint32_t a = 4 * randrange(1, m / 4) + 1;
  uint32_t c = 2 * randrange(0, m / 2) + 1;
  uint32_t x = randrange(0, m);
  while (count--) {
    do
      x = (a * x + c) & (m - 1);
    while (x >= *b);
    handle(enumerate(b, x, n, k), n);
  }
  return true;
}
 

注意::我没有提供randrangeshuffle的实现;代码很容易获得. randrange(low, lim)产生范围为[low, lim)的随机整数; shuffle(vec, n)随机对长度为n的整数矢量vec进行洗牌.

此外,循环还会为每个生成的单词调用handle(word, n).必须用每种组合所要做的一切代替.

handle定义为不执行任何操作的功能,gray_combs用了150毫秒在我的笔记本电脑上查找了设置14位的所有40,116,600个28位字. lcg_combs花了5.5秒.

I need to generate binary numbers with the same quantity of ones (or zeros) in random order.
Does anyone know any efficient algorithm for fixed-length binary numbers? Example for 2 ones and 4 digits (just to be more clear):

1100
1010
1001
0110
0101
0011

UPDATE Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.

解决方案

If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if n is large, particularly if you are not sure you will need to complete the iteration.

But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.)

1. Bit shuffle an enumeration

The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new 1 is added, some other 1 must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of n the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where n is 4 and k is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.)

It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.)

111000   010110   100011   010101
101100   001110   010011   001101
011100   101010   001011   101001
110100   011010   000111   011001
100110   110010   100101   110001

This sequence can be produced by a recursive algorithm similar to that suggested by @JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations.

2. Randomly generate an integer sequence and convert it to combinations

The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in [0, n choose k)) and then find the word corresponding to each index.

The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence xi = (a * xi-1 + c) mod m. If m is a power of 2, a mod 4 is 1 and c mod 2 is 1, then that recursion will cycle through all 2m possible values. To cycle through the range [0, n choose k), we simply select m to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.)

To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of n choose k words consists of n-1 choose k words starting with a 0 and n-1 choose k-1 words starting with a 1. So to produce the ith word:

  • if i < n-1 choose k we output a 0 and then the ith word in the set of n-1 bit words with k bits set;
  • otherwise, we output a 1 and then subtract n-1 choose k from i as the index into the set of n-1 bit words with k-1 bits set.

It's convenient to precompute all the useful binomial coefficients.

LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of a and c will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing 1UL<<n with a table lookup into the shuffled bits).

The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array:

  row
index
 [ 0]   1
 [ 1]   1 1
 [ 3]   1 2 1
 [ 6]   1 3 3 1
 [10]   1 4 6 4 1

As can be seen, the array index for binom(n, k) is n(n+1)/2 + k, and if we have that index, we can find binom(n-1, k) by simply subtracting n, and binom(n-1, k-1) by subtracting n+1. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where k is negative or greater than n. In particular, if we have arrived at a point in the recursion where k == n or k == 0, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some n and k will consist precisely of n-k zeros followed by k ones, which is the n-bit binary representation of 2k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of binom(n-1, k) or binom(n-1, k-1) is not a valid index.

C code for the two solutions

Gray code with shuffled bits

void gray_combs(int n, int k) {
  /* bit[i] is the ith shuffled bit */
  uint32_t bit[n+1];
  {
    uint32_t mask = 1;
    for (int i = 0; i < n; ++i, mask <<= 1)
      bit[i] = mask;
    bit[n] = 0;
    shuffle(bit, n);
  }

  /* comb[i] for 0 <= i < k is the index of the ith bit
   * in the current combination. comb[k] is a sentinel. */
  int comb[k + 1];
  for (int i = 0; i < k; ++i) comb[i] = i;
  comb[k] = n;

  /* Initial word has the first k (shuffled) bits set */
  uint32_t word = 0;
  for (int i = 0; i < k; ++i) word |= bit[i];

  /* Now iterate over all combinations */
  int j = k - 1; /* See Ruskey for meaning of j */
  do {
    handle(word, n);
    if (j < 0) {
      word ^= bit[comb[0]] | bit[comb[0] - 1];
      if (--comb[0] == 0) j += 2;
    }
    else if (comb[j + 1] == comb[j] + 1) {
      word ^= bit[comb[j + 1]] | bit[j];
      comb[j + 1] = comb[j]; comb[j] = j;
      if (comb[j + 1] == comb[j] + 1) j += 2;
    }
    else if (j > 0) {
      word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
      comb[j - 1] = comb[j]; ++comb[j];
      j -= 2;
    }
    else {
      word ^= bit[comb[j]] | bit[comb[j] + 1];
      ++comb[j];
    }
  } while (comb[k] == n);
}

LCG with enumeration index to word conversion

static const uint32_t* binom(unsigned n, unsigned k) {
  static const uint32_t b[] = {
    1,
    1, 1,
    1, 2, 1,
    1, 3, 3, 1,
    1, 4, 6, 4, 1,
    1, 5, 10, 10, 5, 1,
    1, 6, 15, 20, 15, 6, 1,
    // ... elided for space
  };
  return &b[n * (n + 1) / 2 + k];
}

static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
  uint32_t rv = 0;
  while (r) {
    do {
      b -= n;
      --n;
    } while (r < *b);
    r -= *b;
    --b;
    --k;
    rv |= 1UL << n;
  }
  return rv + (1UL << k) - 1;
}

static bool lcg_combs(unsigned n, unsigned k) {
  const uint32_t* b = binom(n, k);
  uint32_t count = *b;
  uint32_t m = 1; while (m < count) m <<= 1;
  uint32_t a = 4 * randrange(1, m / 4) + 1;
  uint32_t c = 2 * randrange(0, m / 2) + 1;
  uint32_t x = randrange(0, m);
  while (count--) {
    do
      x = (a * x + c) & (m - 1);
    while (x >= *b);
    handle(enumerate(b, x, n, k), n);
  }
  return true;
}

Note: I didn't include the implementation of randrange or shuffle; code is readily available. randrange(low, lim) produces a random integer in the range [low, lim); shuffle(vec, n) randomly shuffles the integer vector vecof length n.

Also, the the loop calls handle(word, n) for each generated word. That must must be replaced with whatever is to be done with each combination.

With handle defined as a function which does nothing, gray_combs took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. lcg_combs took 5.5 seconds.

这篇关于以随机顺序迭代具有相同数量的一(或零)的二进制数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆