最长的二进制序列,没有相等的n长度子序列 [英] Longest binary sequence with no equal n-length subsequences
问题描述
我们正在寻找一种符合以下条件的算法。
We are looking for an algorithm with the following criteria.
输入是任意正整数( n
),表示比较子序列的长度。
Input is an arbitrary positive integer (n
), that represents the length of the compare subsequences.
我们搜索最长的二进制序列,该序列不包含相等的n长度子序列。匹配的相等序列可以重叠(当匹配项必须不相交时,这也是一个有趣的问题)。输出将是此位序列。
We search the longest binary sequence, which contains no equal n-length subsequences. Matched equal sequences can be overlapped (also an interesting problem when matches must be disjoint). Output will be this sequence of bits.
例如,如果 n = 3
:
10111010
无效,因为重复的 101
子序列。 01010
也是无效的,因为多次出现 010
。 01101001
是有效的,但显然不是最长的序列。
10111010
is invalid because of the repeating 101
subsequences. 01010
is also invalid because of multiple occurrences of 010
. 01101001
is valid, but evidently not the longest possible sequence.
推荐答案
谷歌搜索二进制De Bruijn序列算法,我发现了这一点,您实际上可以分辨出发生了什么。它被称为 FKM算法(在Fredricksen,Kessler和Maiorana之后),它使用项链前缀方法找到了字典上最小的De Bruijn序列。我将使用n = 4的示例进行说明。
Googling for binary De Bruijn sequence algorithms, I found this one where you can actually tell what's happening. Known as the "FKM algorithm" (after Fredricksen, Kessler and Maiorana), it finds the lexicographically least De Bruijn sequence using the "necklace prefix" method. I'll explain using the example with n=4.
首先,创建长度为n的所有二进制序列,即从0到2 n -1的所有数字:
First, create all binary sequences of length n, i.e. all numbers from 0 to 2n-1:
0000,0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,
1011,1100,1101,1110,1111
0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111
然后,删除不是旋转最低的序列,例如可以将 0110
旋转为较小的 0011
:
Then, remove the sequences which are not in their lowest rotation, e.g. 0110
can be rotated to 0011
which is smaller:
0000,0001,0011,0101,0111,1111
0000, 0001, 0011, 0101, 0111, 1111
(您会注意到这会删除所有除 0000
以外的偶数,以及所有大于 0111
的数,除了 1111
,这有助于简化代码。)
(You'll notice that this removes a.o. all even numbers except 0000
, and all numbers greater than 0111
except 1111
, which helps to simplify code.)
然后将序列减少为非周期性前缀,即如果它们是重复较短的序列,请使用较短的序列;例如 0101
是 01
的重复, 1111
是重复的的 1
:
Then reduce the sequences to their "aperiodic prefix", i.e. if they are a repetition of a shorter sequence, use that shorter sequence; e.g. 0101
is a repetition of 01
, 1111
is a repetition of 1
:
0,0001,0011,01,0111,1
0, 0001, 0011, 01, 0111, 1
加入序列,您将获得De Bruijn序列:
Join the sequences, and you have a De Bruijn sequence:
0000100110101111
0000100110101111
对于非循环序列,请添加n-1个零:
For a non-circular sequence, add n-1 zeros:
0000100110101111000
0000100110101111000
(更多信息: F。Ruskey,J。Sawada,A。Williams:固定权重二进制的De Bruijn序列字符串 和B.史蒂文斯,A。威廉姆斯:二进制字符串的最酷顺序,摘自:算法的乐趣,2012年,第327-328页)
(further information: F. Ruskey, J. Sawada, A. Williams: "De Bruijn Sequences for Fixed-Weight Binary Strings" and B. Stevens, A. Williams: "The Coolest Order Of Binary Strings", from: "Fun With Algorithms", 2012, pp. 327-328)
我很想知道FKM与其他算法相比的性能如何,所以我写了这个相当笨拙的JavaScript实现。它确实快得多,并且在一秒钟之内生成了N = 20的1,048,595位数字序列。用严肃的语言,这应该很快。
I was curious to see how FKM performed compared to my other algorithm, so I wrote this rather clumsy JavaScript implementation. It is indeed much faster, and generates the 1,048,595 digit sequence for N=20 in under a second. In a serious language this should be very fast.
function DeBruijnFKM(n) {
var seq = "0"; // start with 0 precalculated
for (var i = 1; i < n; i++) { // i = number of significant bits
var zeros = "", max = Math.pow(2, i);
for (var j = n; j > i; j--) zeros += "0"; // n-i leading zeros
for (var k = i > 1 ? max / 2 + 1 : 1; k < max; k += 2) { // odd numbers only
var bin = k.toString(2); // bin = significant bits
if (isSmallestRotation(zeros, bin)) {
seq += aperiodicPrefix(zeros, bin);
}
}
}
return seq + Math.pow(2, n - 1).toString(2); // append 2^N-1 and trailing zeros
function isSmallestRotation(zeros, bin) {
var len = 0, pos = 1; // len = number of consecutive zeros in significant bits
for (var i = 1; i < bin.length; i++) {
if (bin.charAt(i) == "1") {
if (len > zeros.length) return false; // more zeros than leading zeros
if (len == zeros.length
&& zeros + bin > bin.substr(pos) + zeros + bin.substr(0, pos)) {
return false; // smaller rotation found
}
len = 0;
pos = i + 1;
}
else ++len;
}
return true;
}
function aperiodicPrefix(zeros, bin) {
if (zeros.length >= bin.length) return zeros + bin; // too many leading zeros
bin = zeros + bin;
for (var i = 2; i <= bin.length / 2; i++) { // skip 1; not used for 0 and 2^N-1
if (bin.length % i) continue;
var pre = bin.substr(0, i); // pre = prefix of length i
for (var j = i; j < bin.length; j += i) {
if (pre != bin.substr(j, i)) break; // non-equal part found
}
if (j == bin.length) return pre; // all parts are equal
}
return bin; // no repetition found
}
}
document.write(DeBruijnFKM(10));
这篇关于最长的二进制序列,没有相等的n长度子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!