HyperLogLog 算法是如何工作的? [英] How does the HyperLogLog algorithm work?

查看：16 发布时间：2021/12/17 15:13:19 database algorithm math data-structures hyperloglog

本文介绍了HyperLogLog 算法是如何工作的?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近在业余时间学习了不同的算法，我遇到的一个看起来很有趣的算法叫做 HyperLogLog 算法——它估计列表中有多少独特的项目.

I've been learning about different algorithms in my spare time recently, and one that I came across which appears to be very interesting is called the HyperLogLog algorithm - which estimates how many unique items are in a list.

这对我来说特别有趣，因为当我看到基数"值(直到最近我一直认为它是计算而不是估计的)时，它让我回到了我的 MySQL 时代.

This was particularly interesting to me because it brought me back to my MySQL days when I saw that "Cardinality" value (which I always assumed until recently that it was calculated not estimated).

所以我知道如何在 O(n) 中编写一个算法来计算数组中有多少唯一项.我是用 JavaScript 写的:

So I know how to write an algorithm in O(n) that will calculate how many unique items are in an array. I wrote this in JavaScript:

function countUniqueAlgo1(arr) {
    var Table = {};
    var numUnique = 0;
    var numDataPoints = arr.length;
    for (var j = 0; j < numDataPoints; j++) {
        var val = arr[j];
        if (Table[val] != null) {
            continue;
        }
        Table[val] = 1;
        numUnique++;
    }
    return numUnique;
}

但问题是我的算法虽然 O(n) 使用了大量内存(在 Table 中存储值).

But the problem is that my algorithm, while O(n), uses a lot of memory (storing values in Table).

我一直在阅读这篇论文关于如何计算列表中的重复项O(n) 时间并使用最少的内存.

I've been reading this paper about how to count duplicates in a list in O(n) time and using minimal memory.

它解释了通过散列和计数位或可以在一定概率(假设列表均匀分布)内估计列表中唯一项的数量.

It explains that by hashing and counting bits or something one can estimate within a certain probability (assuming the list is evenly distributed) the number of unique items in a list.

我读过论文，但我似乎无法理解.有人可以给出更外行的解释吗?我知道哈希是什么，但我不明白它们在这个 HyperLogLog 算法中是如何使用的.

I've read the paper, but I can't seem to understand it. Can someone give a more layperson's explanation? I know what hashes are, but I don't understand how they are used in this HyperLogLog algorithm.

HyperLogLog 算法是如何工作的? [英] How does the HyperLogLog algorithm work?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

HyperLogLog 算法是如何工作的? [英] How does the HyperLogLog algorithm work?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭