布鲁姆过滤器对面? [英] Opposite of Bloom filter?

查看:166
本文介绍了布鲁姆过滤器对面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图优化一个基本上运行数百万个测试的软件。这些测试的生成方式可能会有一些重复。当然,我不想花时间运行我已经运行的测试,如果我可以有效地避免它。



所以,我正在考虑使用一个布鲁姆过滤器存储已经运行的测试。但是,Bloom过滤器对我来说是不安全的。它给出了错误的肯定。也就是说,它可能会报告我已经进行了一个我没有的测试。虽然这在我正在工作的情况下是可以接受的,但我想知道是否有相当于Bloom过滤器,但是在相反的方面却是错误的,也就是说只会产生错误的否定。



我已经浏览了文学,没有任何运气。

解决方案

是的,有损的哈希表或LRUCache是​​一个数据结构,具有快速的O(1)查找,只会产生错误的负面 - 如果你问我是否运行测试X,它会告诉你是的,你肯定有或我可以记住。



原谅伪码:

  setup_test_table ():
创建test_table(一些大量条目)
清除每个条目(test_table,NEVER)
返回test_table

has_test_been_run_before(new_test_details,test_table):
index = hash(test_details,test_table.length)
old_details = test_table [index] .detail
//无条件o用新的细节,LRU时尚的旧细节。
//或许其他一些碰撞解决技术可能会更好。
test_table [index] .details = new_test_details
if(old_details === test_details)return YES
else if(old_details === NEVER)返回NEVER
否则返回PERHAPS

main()
test_table = setup_test_table();
loop
test_details = generate_random_test()
status = has_test_been_run_before(test_details,test_table)
案例状态为
是:不做任何事情;
从不:run test(test_details);
PERHAPS:if(rand()& 1)运行test(test_details);
下一个循环
结束。


I'm trying to optimize a piece of software which is basically running millions of tests. These tests are generated in such a way that there can be some repetitions. Of course, I don't want to spend time running tests which I already ran if I can avoid it efficiently.

So, I'm thinking about using a Bloom filter to store the tests which have been already ran. However, the Bloom filter errs on the unsafe side for me. It gives false positives. That is, it may report that I've ran a test which I haven't. Although this could be acceptable in the scenario I'm working on, I was wondering if there's an equivalent to a Bloom filter, but erring on the opposite side, that is, only giving false negatives.

I've skimmed through the literature without any luck.

解决方案

Yes, a lossy hash table or a LRUCache is a data structure with fast O(1) lookup that will only give false negatives -- if you ask if "Have I run test X", it will tell you either "Yes, you definitely have", or "I can't remember".

Forgive the extremely crude pseudocode:

setup_test_table():
    create test_table( some large number of entries )
    clear each entry( test_table, NEVER )
    return test_table

has_test_been_run_before( new_test_details, test_table ):
    index = hash( test_details , test_table.length )
    old_details = test_table[index].detail
    // unconditionally overwrite old details with new details, LRU fashion.
    // perhaps some other collision resolution technique might be better.
    test_table[index].details = new_test_details
    if ( old_details === test_details ) return YES
    else if ( old_details === NEVER ) return NEVER
    else return PERHAPS    

main()
    test_table = setup_test_table();
    loop
        test_details = generate_random_test()
        status = has_test_been_run_before( test_details, test_table )
        case status of
           YES: do nothing;
           NEVER: run test (test_details);
           PERHAPS: if( rand()&1 ) run test (test_details);
    next loop
end.

这篇关于布鲁姆过滤器对面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆