使用Python的现代,高性能Bloom过滤器? [英] Modern, high performance bloom filter in Python?
问题描述
我正在寻找一种Python中的生产质量Bloom过滤器实现,以处理相当多的项目(例如100M到1B的项目,其误报率为0.01%).
I'm looking for a production quality bloom filter implementation in Python to handle fairly large numbers of items (say 100M to 1B items with 0.01% false positive rate).
Pybloom 是一种选择,但由于它会抛出DeprecationWarning错误,因此似乎正在显示其年龄定期使用Python 2.5. Joe Gregorio还具有实现.
Pybloom is one option but it seems to be showing its age as it throws DeprecationWarning errors on Python 2.5 on a regular basis. Joe Gregorio also has an implementation.
要求是快速查找性能和稳定性.我也愿意为特别好的c/c ++实现创建Python接口,如果有很好的Java实现,甚至对Jython也很开放.
Requirements are fast lookup performance and stability. I'm also open to creating Python interfaces to particularly good c/c++ implementations, or even to Jython if there's a good Java implementation.
缺少这一点,关于可以处理约16E9位的位阵列/位向量表示的任何建议吗?
Lacking that, any recommendations on a bit array / bit vector representation that can handle ~16E9 bits?
推荐答案
最终,我找到了 pybloomfiltermap .我没有用过,但是看起来很合适.
Eventually I found pybloomfiltermap. I haven't used it, but it looks like it'd fit the bill.
这篇关于使用Python的现代,高性能Bloom过滤器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!