布鲁姆过滤器对面? [英] Opposite of Bloom filter?
问题描述
所以,我正在考虑使用一个布鲁姆过滤器存储已经运行的测试。但是,Bloom过滤器对我来说是不安全的。它给出了错误的肯定。也就是说,它可能会报告我已经进行了一个我没有的测试。虽然这在我正在工作的情况下是可以接受的,但我想知道是否有相当于Bloom过滤器,但是在相反的方面却是错误的,也就是说只会产生错误的否定。
我已经浏览了文学,没有任何运气。
是的,有损的哈希表或LRUCache是一个数据结构,具有快速的O(1)查找,只会产生错误的负面 - 如果你问我是否运行测试X,它会告诉你是的,你肯定有或我可以记住。
原谅伪码:
setup_test_table ():
创建test_table(一些大量条目)
清除每个条目(test_table,NEVER)
返回test_table
has_test_been_run_before(new_test_details,test_table):
index = hash(test_details,test_table.length)
old_details = test_table [index] .detail
//无条件o用新的细节,LRU时尚的旧细节。
//或许其他一些碰撞解决技术可能会更好。
test_table [index] .details = new_test_details
if(old_details === test_details)return YES
else if(old_details === NEVER)返回NEVER
否则返回PERHAPS
main()
test_table = setup_test_table();
loop
test_details = generate_random_test()
status = has_test_been_run_before(test_details,test_table)
案例状态为
是:不做任何事情;
从不:run test(test_details);
PERHAPS:if(rand()& 1)运行test(test_details);
下一个循环
结束。
I'm trying to optimize a piece of software which is basically running millions of tests. These tests are generated in such a way that there can be some repetitions. Of course, I don't want to spend time running tests which I already ran if I can avoid it efficiently.
So, I'm thinking about using a Bloom filter to store the tests which have been already ran. However, the Bloom filter errs on the unsafe side for me. It gives false positives. That is, it may report that I've ran a test which I haven't. Although this could be acceptable in the scenario I'm working on, I was wondering if there's an equivalent to a Bloom filter, but erring on the opposite side, that is, only giving false negatives.
I've skimmed through the literature without any luck.
Yes, a lossy hash table or a LRUCache is a data structure with fast O(1) lookup that will only give false negatives -- if you ask if "Have I run test X", it will tell you either "Yes, you definitely have", or "I can't remember".
Forgive the extremely crude pseudocode:
setup_test_table():
create test_table( some large number of entries )
clear each entry( test_table, NEVER )
return test_table
has_test_been_run_before( new_test_details, test_table ):
index = hash( test_details , test_table.length )
old_details = test_table[index].detail
// unconditionally overwrite old details with new details, LRU fashion.
// perhaps some other collision resolution technique might be better.
test_table[index].details = new_test_details
if ( old_details === test_details ) return YES
else if ( old_details === NEVER ) return NEVER
else return PERHAPS
main()
test_table = setup_test_table();
loop
test_details = generate_random_test()
status = has_test_been_run_before( test_details, test_table )
case status of
YES: do nothing;
NEVER: run test (test_details);
PERHAPS: if( rand()&1 ) run test (test_details);
next loop
end.
这篇关于布鲁姆过滤器对面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!