Redis HLL的误报过多 [英] redis HLL too many false positives

查看:104
本文介绍了Redis HLL的误报过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hyperlog日志是一种概率算法 根据redis HLL文档,我们可以获得0.81%的错误,但是我却得到17-20%的错误

Hyperlog log is a probablistic algorithm According to the redis HLL document , we could get 0.81% of error but I get errors like 17-20%

我认为这里有问题..这是我简单的perl测试脚本.是否有错误

I think there is something wrong .. This is my simple perl test script. Is there some error

#!/usr/bin/perl -w                                                                                                                                                       
use Redis;
my $redis = Redis->new(server=>'192.168.50.166:6379') or die;
my $fp=0;
my $HLL="HLL";

$redis->del($HLL);
foreach my $i (1..10000) {
  my $s1 = $redis->pfadd($HLL,$i);
  if($s1 == 0){ 
    print "False positive on $i\n";
    $fp++;
  }
}
print "count of false positives $fp\n";

推荐答案

HyperLogLog用于计数唯一项.它可以用很少的内存计算大量的项目.但是,返回的基数不是精确的,而是近似为standard error.

HyperLogLog is used for counting unique items. It can count a large number of items with a little memory. However, the returned cardinality is NOT exact, but approximated with a standard error.

0.81%standard error,不是误报.对于您的实例,您可以调用PFCOUNT HLL以获取放入HyperLogLog中的唯一项目的近似数量.返回的数字应在[10000 * (1 - 0.81%), 10000 * (1 + 0.81%)]范围内.

0.81% is the standard error, NOT the false positive. For your instance, you can call PFCOUNT HLL to get the approximated number of unique items you put into the HyperLogLog. The returned number should be in range of [10000 * (1 - 0.81%), 10000 * (1 + 0.81%)].

PFADD返回1.否则返回0.与false positive无关.

PFADD returns 1 if the estimated cardinality is changed after executing the command. It returns 0, otherwise. It has nothing to do with false positive.

似乎您需要的是花朵过滤器,它可以告诉您是否有商品数据集中已经存在,且误报为肯定.当然,您可以使用Redis实现Bloom Filter.并且应该有一些开源项目.

It seems what you need is a Bloom Filter, which can tell you if an item already exists in a data set, with false positive. You can implement a Bloom Filter with Redis, of course. And there should be some open source project for that.

这篇关于Redis HLL的误报过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆