从集合中找到多个最大不同的二元向量 [英] Finding a number of maximally different binary vectors from a set
问题描述
考虑所有长度为 n 的二元向量的集合 S ,其中每个精确地包含 m 个。因此每个向量中都有 nm 个零。
我的目标是从 S k 的向量>,使得这些向量彼此之间尽可能地不同。
Consider the set, S, of all binary vectors of length n where each contains exactly m ones; so there are n-m zeros in each vector.
My goal is to construct a number, k, of vectors from S such that these vectors are as different as possible from each other.
举一个简单的例子,取 n = 4, m = 2和 k = 2,那么可能的解决方案是:[1,1,0,0]和[0,0,1,1]。
As a simple example, take n=4, m=2 and k=2, then a possible solution is: [1,1,0,0] and [0,0,1,1].
似乎这是编码理论文献(?)中的一个开放问题。
It seems that this is an open problem in the coding theory literature (?).
有什么方法(即算法)来找到次优而又好的解决方案?
在这种情况下,汉明距离是否是正确的性能指标? ?
Is there any way (i.e. algorithm) to find a suboptimal yet good solution ?
Is Hamming distance the right performance measure to use in this case ?
一些想法:
在本文,作者提出了一些算法来找到向量子集,从而使成对的汉明距离> =某个值 d 。
我实现了Random方法,如下所示:设置一个 SS 集合,该集合可以由任何 S 中的向量。然后,我考虑 S 中剩余的向量
。对于这些向量中的每一个,我检查该向量相对于 SS 中的每个向量是否至少具有距离 d 。如果是这样,则将其添加到 SS 。
如果 SS 的大小取最大可能的 d 是> = k ,那么我认为 SS 是最佳解决方案,我从 SS 中选择 k 个向量的任何子集。
使用这种方法,我认为生成的 SS 将取决于 SS 中初始矢量的身份。即有多个解决方案(?)。
但是,如果 SS 的大小小于< k ?
从本文提出的算法中,我仅了解随机算法。我对Binary lexicographic搜索(第2.3节)感兴趣,但我不知道如何实现它(?)。
Some thoughts:
In this paper, the authors propose a couple of algorithms to find the subset of vectors such that the pairwise Hamming distance is >= a certain value, d.
I have implemented the Random approach as follows: take a set SS, which is initialized by any vector from S. Then, I consider the remaining vectors
in S. For each of these vectors, I check if this vector has at least a distance d with respect to each vector in SS. If so, then it is added to SS.
By taking the maximal possible d, if the size of SS is >= k, then I consider SS as an optimal solution, and I choose any subset of k vectors from SS.
Using this approach, I think that the resulting SS will depend on the identity of the initial vector in SS; i.e. there are multiple solutions(?).
But how to proceed if the size of SS is < k ?
From the proposed algorithms in the paper, I have only understood the Random one. I am interested in the Binary lexicographic search (section 2.3) but I don't know how to implement it (?).
推荐答案
也许您发现本文有用(我写的)。它包含有效创建位串置换的算法。
Maybe you find this paper useful (I wrote it). It contains algorithms that efficiently create permutations of bitstrings.
例如, inc()
算法:
long inc(long h_in , long m0 , long m1) {
long h_out = h_in | (~m1); //pre -mask
h_out ++;
// increment
h_out = (h_out & m1) | m0; //post -mask
return h_out;
}
需要输入 h_in
并返回下一个比 h_in
大至少1的值,并匹配边界 m0
和 m1
。 匹配是指:结果具有 1
,而 m0
具有 1
,结果为 0
,而 m1
的结果为 0
。并不是 h_in
必须是关于 mo
和 m1 $ c的有效值$ c>!另外,请注意
m0
必须比 m1
逐位小,这意味着 m0
在 m1
的 0位置不能有
。 1
It takes an input h_in
and return the next higher value that is at least 1 larger than h_in
and 'matches' the boundaries m0
and m1
. 'Matching' means: the result has a 1
whereever m0
has a 1
, and the result has a 0
whereever m1
has a 0
. Not that h_in
MUST BE a valid value with regards to mo
and m1
! Also, note that m0
has to be bitwise smaller than m1
, which means that m0
cannot have a 1
in a position where m1
has a 0
.
这可以用于生成距给定输入字符串最小编辑距离的排列:
This could be used to generate permutations with a minimum edit distance to a given input string:
假设您有 0110
,您首先将其与 1001
否定(编辑距离= k )。
设置 m0 = 1001和 m1 = 1001。使用此功能只会对'1001'本身产生结果。
Let's assume you have 0110
, you first NEGATE it to 1001
(edit distance = k).
Set 'm0=1001' and 'm1=1001'. Using this would result only on '1001' itself.
现在要获取所有编辑距离为 k-1 的值,您可以执行以下操作,只需翻转 m0
或 m1
的位之一,然后翻转 inc()
将返回所有差异为 k
或 k-1
的位串的有序序列。
Now to get all values with edit distance k-1, you can do the following, simply flip one of the bits of m0
or m1
, then inc()
will return an ordered series of all bitstring that have a difference of k
or k-1
.
我知道,还不是很有趣,但是您最多可以修改 k
位和 inc()
将始终返回与 m0
和有关的所有排列,其中最大允许编辑差异为 m1
。
I know, not very interesting yet, but you can modify up to k
bits, and inc()
will always return all permutations with the maximum allowed edit difference with regard to m0
and m1
.
现在,要获得 all 排列,您必须使用 m0的所有可能组合重新运行算法
和 m1
。
Now, to get all permutations, you would have to re-run the algorithm with all possibly combinations of m0
and m1
.
示例:获取<$ c $的所有可能排列c> 0110 且编辑距离为 2
,则必须运行具有以下排列的 m0 = 0110的inc()
和 m1 = 0110
(要获得置换,必须扩展位位置,这意味着 m0
设置为 0
, m1
设置为 1
:
Example: To get all possible permutations of 0110
with edit distance 2
, you would have to run inc() with the following permutations of m0=0110
and m1=0110
(to get permutations, a bit position has to be expanded, meaning that m0
is set to 0
and m1
is set to 1
:
- 位0和1 扩展:
m0 = 0010
和m1 = 1110
- 第0位和第2位扩展 >:
m0 = 0100
和m1 = 1110
- 位0和3 扩展:
m0 = 0110
和m1 = 1111
- 第1位和第2位扩展了:
m0 = 0000
和m1 = 0110
- 第1位和第3位扩展:
m0 = 0010
和m1 = 0111
- 第2位和第3位扩展:
m0 = 0100
和m1 = 0111
- Bit 0 and 1 expanded:
m0=0010
andm1=1110
- Bit 0 and 2 expanded:
m0=0100
andm1=1110
- Bit 0 and 3 expanded:
m0=0110
andm1=1111
- Bit 1 and 2 expanded:
m0=0000
andm1=0110
- Bit 1 and 3 expanded:
m0=0010
andm1=0111
- Bit 2 and 3 expanded:
m0=0100
andm1=0111
作为 h_0
的起始值,我建议仅使用 m0
。一旦 inc()
返回 m1
,迭代就可以中止。
As starting value for h_0
I suggest to use simply m0
. Iteration can be aborted once inc()
returns m1
.
摘要
上面的算法在 O(x)中生成所有 x
个二进制向量,它们在给定向量 v
至少有 y
位(可配置)。
Summary
The above algorithm generates in O(x) all x
binary vectors that differ in at least y
bits (configurable) from a given vector v
.
使用您定义的 n
= 向量v中的位数设置
y = n
恰好生成1个向量,它与输入向量 v
完全相反。对于 y = n-1
,它将生成 n + 1
个向量: n
矢量,除了一位外,其他所有位都不同;还有1个矢量,其所有位均不同。等等,以不同的值 y
。
Using your definition of n
=number of bits in a vector v
, setting y=n
generates exactly 1 vector which is the exact opposite of the input vector v
. For y=n-1
, it will generate n+1
vectors: n
vectors which differ in all but one bits and 1 vector that differs in all bits. And so on different values of y
.
**编辑:添加了摘要,并用'替换了错误的'XOR'否定。
** Added summary and replaced erroneous 'XOR' with 'NEGATE' in the text above.
这篇关于从集合中找到多个最大不同的二元向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!