有效地找到匹配位掩码的第一个元素 [英] efficiently find the first element matching a bit mask

查看：137 发布时间：2015/11/30 14:23:34 algorithm search optimization data-structures bitmask

本文介绍了有效地找到匹配位掩码的第一个元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个列表的 N 64位整数，其位重新present小套。每个整数最多有 K 位设置为1，给定一个位掩码，我想找到的面具，即元素与放大器相匹配的列表中的第一个元素;面具==元素。

I have a list of N 64-bit integers whose bits represent small sets. Each integer has at most k bits set to 1. Given a bit mask, I would like to find the first element in the list that matches the mask, i.e. element & mask == element.

示例：

如果我的名单是：

index abcdef
  0   001100
  1   001010
  2   001000
  3   000100
  4   000010
  5   000001
  6   010000
  7   100000
  8   000000

和我的面具是 111000 ，匹配面具的第一个元素是指数2。

and my mask is 111000, the first element matching the mask is at index 2.

方法1：

整个列表的线性搜索。这需要O（ N ）的时间和O（1）空间。

Linear search through the entire list. This takes O(N) time and O(1) space.

方法2：

precompute一棵树，并在每个节点保留该面具的答案。这需要O（1）时间查询，但需要O（2 ^ 64）空间。

Precompute a tree of all possible masks, and at each node keep the answer for that mask. This takes O(1) time for the query, but takes O(2^64) space.

问：

我怎样才能找到匹配的面具快于澳第一个元素（ N ），而仍然使用的空间，合理的薪酬？我可以花得起多项式时间在precomputation，因为会有很多的查询。关键是， K 小。在我的应用程序，<强> K ＆LT; = 5和 N 是成千上万。面膜有很多1秒;可以假定它被均匀地从64位整数的空间引出。

How can I find the first element matching the mask faster than O(N), while still using a reasonable amount of space? I can afford to spend polynomial time in precomputation, because there will be a lot of queries. The key is that k is small. In my application, k <= 5 and N is in the thousands. The mask has many 1s; you can assume that it is drawn uniformly from the space of 64-bit integers.

更新：

下面是一个例子的数据集，并在Linux上运行一个简单的基准测试程序：的http：// up.thirld.com/binmask.tar.gz 。对于 large.in ， N = 3779和 K = 3。第一行是 N ，随后的 N 64位无符号整数重新presenting的元素。编译与让。与 ./ benchmark.e＆GT运行; large.out 以创建真正的输出，然后你就可以差异反对。（面具是随机生成的，但随机种子是固定的。），然后用你的实现替换 find_first（）的功能。

Here is an example data set and a simple benchmark program that runs on Linux: http://up.thirld.com/binmask.tar.gz. For large.in, N=3779 and k=3. The first line is N, followed by N unsigned 64-bit ints representing the elements. Compile with make. Run with ./benchmark.e >large.out to create the true output, which you can then diff against. (Masks are generated randomly, but the random seed is fixed.) Then replace the find_first() function with your implementation.

简单的线性搜索的速度远远超过我的预期。这是因为 K 小，所以随机面具，找到一个匹配的速度非常快，平均

The simple linear search is much faster than I expected. This is because k is small, and so for a random mask, a match is found very quickly on average.

推荐答案

一个后缀树（位）会做的伎俩，用在叶节点原来的优先级：

A suffix tree (on bits) will do the trick, with the original priority at the leaf nodes:

000000 -> 8
     1 -> 5
    10 -> 4
   100 -> 3
  1000 -> 2
    10 -> 1
   100 -> 0
 10000 -> 6
100000 -> 7

在这里，如果该位在掩码设置，你搜索双臂，如果没有，你只搜索0臂;你的答案是你遇到的叶节点的最低数量。

where if the bit is set in the mask, you search both arms, and if not, you search only the 0 arm; your answer is the minimum number you encounter at a leaf node.

您可以通过遍历不是为了而是由最大的鉴别力位提高本（略）;在你的榜样，注意3个元素有2位的设置，所以您将创建

You can improve this (marginally) by traversing the bits not in order but by maximum discriminability; in your example, note that 3 elements have bit 2 set, so you would create

2:0 0:0 1:0 3:0 4:0 5:0 -> 8
                    5:1 -> 5
                4:1 5:0 -> 4
            3:1 4:0 5:0 -> 3
        1:1 3:0 4:0 5:0 -> 6
    0:1 1:0 3:0 4:0 5:0 -> 7
2:1 0:0 1:0 3:0 4:0 5:0 -> 2
                4:1 5:0 -> 1
            3:1 4:0 5:0 -> 0

在您的例子掩盖这不利于（因为你必须遍历两个位2 == 0和位2 == 1侧，因为你的面具是在第2位设置），但平均会提高的结果（但在设置和更复杂的数据结构的成本）。如果某些位更可能比其他人进行设置，这可能是一个巨大的胜利。如果他们是pretty的接近随机元素列表中，那么这并没有帮助的。

In your example mask this doesn't help (since you have to traverse both the bit2==0 and bit2==1 sides since your mask is set in bit 2), but on average it will improve the results (but at a cost of setup and more complex data structure). If some bits are much more likely to be set than others, this could be a huge win. If they're pretty close to random within the element list, then this doesn't help at all.

如果你坚持基本上是随机的位设置，你应该得到关于（1-5 / 64）^ 32 受惠于平均后缀树方法（13X加速），其中的也许的，由于使用更复杂的操作会比效率更差（但不要指望它 - 位掩码是快）。如果您有位在列表中的非随机分布的，那么你可以做几乎任意很好。

If you're stuck with essentially random bits set, you should get about (1-5/64)^32 benefit from the suffix tree approach on average (13x speedup), which might be better than the difference in efficiency due to using more complex operations (but don't count on it--bit masks are fast). If you have a nonrandom distribution of bits in your list, then you could do almost arbitrarily well.

这篇关于有效地找到匹配位掩码的第一个元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有效地找到匹配位掩码的第一个元素 [英] efficiently find the first element matching a bit mask

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

有效地找到匹配位掩码的第一个元素 [英] efficiently find the first element matching a bit mask

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭