有效地找到匹配位掩码的第一个元素 [英] efficiently find the first element matching a bit mask

查看:137
本文介绍了有效地找到匹配位掩码的第一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表的 N 64位整数,其位重新present小套。每个整数最多有 K 位设置为1,给定一个位掩码,我想找到的面具,即元素与放大器相匹配的列表中的第一个元素;面具==元素

I have a list of N 64-bit integers whose bits represent small sets. Each integer has at most k bits set to 1. Given a bit mask, I would like to find the first element in the list that matches the mask, i.e. element & mask == element.

示例:

如果我的名单是:

index abcdef
  0   001100
  1   001010
  2   001000
  3   000100
  4   000010
  5   000001
  6   010000
  7   100000
  8   000000

和我的面具是 111000 ,匹配面具的第一个元素是指数2。

and my mask is 111000, the first element matching the mask is at index 2.

方法1:

整个列表的线性搜索。这需要O( N )的时间和O(1)空间。

Linear search through the entire list. This takes O(N) time and O(1) space.

方法2:

precompute一棵树,并在每个节点保留该面具的答案。这需要O(1)时间查询,但需要O(2 ^ 64)空间。

Precompute a tree of all possible masks, and at each node keep the answer for that mask. This takes O(1) time for the query, but takes O(2^64) space.

问:

我怎样才能找到匹配的面具快于澳第一个元素( N ),而仍然使用的空间,合理的薪酬?我可以花得起多项式时间在precomputation,因为会有很多的查询。关键是, K 小。在我的应用程序,<强> K &LT; = 5和 N 是成千上万。面膜有很多1秒;可以假定它被均匀地从64位整数的空间引出。

How can I find the first element matching the mask faster than O(N), while still using a reasonable amount of space? I can afford to spend polynomial time in precomputation, because there will be a lot of queries. The key is that k is small. In my application, k <= 5 and N is in the thousands. The mask has many 1s; you can assume that it is drawn uniformly from the space of 64-bit integers.

更新:

下面是一个例子的数据集,并在Linux上运行一个简单的基准测试程序:的http:// up.thirld.com/binmask.tar.gz 。对于 large.in N = 3779和 K = 3。第一行是 N ,随后的 N 64位无符号整数重新presenting的元素。编译与。与 ./ benchmark.e&GT运行; large.out 以创建真正的输出,然后你就可以差异反对。 (面具是随机生成的,但随机种子是固定的。),然后用你的实现替换 find_first()的功能。

Here is an example data set and a simple benchmark program that runs on Linux: http://up.thirld.com/binmask.tar.gz. For large.in, N=3779 and k=3. The first line is N, followed by N unsigned 64-bit ints representing the elements. Compile with make. Run with ./benchmark.e >large.out to create the true output, which you can then diff against. (Masks are generated randomly, but the random seed is fixed.) Then replace the find_first() function with your implementation.

简单的线性搜索的速度远远超过我的预期。这是因为 K 小,所以随机面具,找到一个匹配的速度非常快,平均

The simple linear search is much faster than I expected. This is because k is small, and so for a random mask, a match is found very quickly on average.

推荐答案

一个后缀树(位)会做的伎俩,用在叶节点原来的优先级:

A suffix tree (on bits) will do the trick, with the original priority at the leaf nodes:

000000 -> 8
     1 -> 5
    10 -> 4
   100 -> 3
  1000 -> 2
    10 -> 1
   100 -> 0
 10000 -> 6
100000 -> 7

在这里,如果该位在掩码设置,你搜索双臂,如果没有,你只搜索0臂;你的答案是你遇到的叶节点的最低数量。

where if the bit is set in the mask, you search both arms, and if not, you search only the 0 arm; your answer is the minimum number you encounter at a leaf node.

您可以通过遍历不是为了而是由最大的鉴别力位提高本(略);在你的榜样,注意3个元素有2位的设置,所以您将创建

You can improve this (marginally) by traversing the bits not in order but by maximum discriminability; in your example, note that 3 elements have bit 2 set, so you would create

2:0 0:0 1:0 3:0 4:0 5:0 -> 8
                    5:1 -> 5
                4:1 5:0 -> 4
            3:1 4:0 5:0 -> 3
        1:1 3:0 4:0 5:0 -> 6
    0:1 1:0 3:0 4:0 5:0 -> 7
2:1 0:0 1:0 3:0 4:0 5:0 -> 2
                4:1 5:0 -> 1
            3:1 4:0 5:0 -> 0

在您的例子掩盖这不利于(因为你必须遍历两个位2 == 0和位2 == 1侧,因为你的面具是在第2位设置),但平均会提高的结果(但在设置和更复杂的数据结构的成本)。如果某些位更可能比其他人进行设置,这可能是一个巨大的胜利。如果他们是pretty的接近随机元素列表中,那么这并没有帮助的。

In your example mask this doesn't help (since you have to traverse both the bit2==0 and bit2==1 sides since your mask is set in bit 2), but on average it will improve the results (but at a cost of setup and more complex data structure). If some bits are much more likely to be set than others, this could be a huge win. If they're pretty close to random within the element list, then this doesn't help at all.

如果你坚持基本上是随机的位设置,你应该得到关于(1-5 / 64)^ 32 受惠于平均后缀树方法(13X加速),其中的也许的,由于使用更复杂的操作会比效率更差(但不要指望它 - 位掩码是快)。如果您有位在列表中的非随机分布的,那么你可以做几乎任意很好。

If you're stuck with essentially random bits set, you should get about (1-5/64)^32 benefit from the suffix tree approach on average (13x speedup), which might be better than the difference in efficiency due to using more complex operations (but don't count on it--bit masks are fast). If you have a nonrandom distribution of bits in your list, then you could do almost arbitrarily well.

这篇关于有效地找到匹配位掩码的第一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆