Redis HLL的误报过多 [英] redis HLL too many false positives
问题描述
Hyperlog日志是一种概率算法 根据redis HLL文档,我们可以获得0.81%的错误,但是我却得到17-20%的错误
Hyperlog log is a probablistic algorithm According to the redis HLL document , we could get 0.81% of error but I get errors like 17-20%
我认为这里有问题..这是我简单的perl测试脚本.是否有错误
I think there is something wrong .. This is my simple perl test script. Is there some error
#!/usr/bin/perl -w
use Redis;
my $redis = Redis->new(server=>'192.168.50.166:6379') or die;
my $fp=0;
my $HLL="HLL";
$redis->del($HLL);
foreach my $i (1..10000) {
my $s1 = $redis->pfadd($HLL,$i);
if($s1 == 0){
print "False positive on $i\n";
$fp++;
}
}
print "count of false positives $fp\n";
推荐答案
HyperLogLog
用于计数唯一项.它可以用很少的内存计算大量的项目.但是,返回的基数不是精确的,而是近似为standard error
.
HyperLogLog
is used for counting unique items. It can count a large number of items with a little memory. However, the returned cardinality is NOT exact, but approximated with a standard error
.
0.81%是standard error
,不是误报.对于您的实例,您可以调用PFCOUNT HLL
以获取放入HyperLogLog
中的唯一项目的近似数量.返回的数字应在[10000 * (1 - 0.81%), 10000 * (1 + 0.81%)]
范围内.
0.81% is the standard error
, NOT the false positive. For your instance, you can call PFCOUNT HLL
to get the approximated number of unique items you put into the HyperLogLog
. The returned number should be in range of [10000 * (1 - 0.81%), 10000 * (1 + 0.81%)]
.
PFADD
返回1.否则返回0.与false positive
无关.
PFADD
returns 1 if the estimated cardinality is changed after executing the command. It returns 0, otherwise. It has nothing to do with false positive
.
似乎您需要的是花朵过滤器,它可以告诉您是否有商品数据集中已经存在,且误报为肯定.当然,您可以使用Redis实现Bloom Filter
.并且应该有一些开源项目.
It seems what you need is a Bloom Filter, which can tell you if an item already exists in a data set, with false positive. You can implement a Bloom Filter
with Redis, of course. And there should be some open source project for that.
这篇关于Redis HLL的误报过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!