如何在HyperLogLog算法的工作吗？ [英] How does the HyperLogLog algorithm work?

查看：406 发布时间：2015/11/30 13:54:47 database algorithm math unique hyperloglog

本文介绍了如何在HyperLogLog算法的工作吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在学习在我的业余时间不同的算法近日，一个我碰到这似乎是很有趣的被称为HyperLogLog算法 - 这估计独特的项目有多少是在一个列表

I've been learning about different algorithms in my spare time recently, and one that I came across which appears to be very interesting is called the HyperLogLog algorithm - which estimates how many unique items are in a list.

这是特别有趣的我，因为它把我带回我的MySQL天，当我看到基数值（我一直认为直到最近，它的计算未估计）。

This was particularly interesting to me because it brought me back to my MySQL days when I saw that "Cardinality" value (which I always assumed until recently that it was calculated not estimated).

所以我知道如何编写算法为O（n），将计算出独特的项目有多少是在数组中。我在Javascript写这个

So I know how to write an algorithm in O(n) that will calculate how many unique items are in an array. I wrote this in Javascript

function countUniqueAlgo1(arr) {
    var Table = {};
    var numUnique = 0;
    var numDataPoints = arr.length;
    for (var j = 0; j < numDataPoints; j++) {
        var val = arr[j];
        if (Table[val] != null) {
            continue;
        }
        Table[val] = 1;
        numUnique++;
    }
    return numUnique;
}

但问题是，我的算法，而O（N），使用了大量的内存（存储值表）。

我一直在阅读本文对如何计算重复在O（n）的时间，使用最少的内存列表。 http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

I've been reading this paper about how to count duplicates in a list in O(n) time and using minimal memory. http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

据介绍，通过散列和计数位什么的人能在一定范围内的概率估算（假设列表中均匀分布）的唯一项目的数量在列表中。

It explains that by hashing and counting bits or something one can estimate within a certain probability (assuming the list is evenly distributed) the number of unique items in a list.

我读过的文章，但我似乎无法理解。有人可以给出一个更外行的解释？我知道什么是哈希值，但我不明白它们是如何在这个HyperLogLog算法中使用。

I've read the paper, but I can't seem to understand it. Can someone give a more layperson's explanation? I know what hashes are, but I don't understand how they are used in this HyperLogLog algorithm.

如何在HyperLogLog算法的工作吗？ [英] How does the HyperLogLog algorithm work?

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何在HyperLogLog算法的工作吗？ [英] How does the HyperLogLog algorithm work?

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭