什么是有损计数? [英] What is Lossy Counting?

查看:470
本文介绍了什么是有损计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以向我解释有损计数算法?它是寻找项目的频率在一个流中的流的算法。谢谢你。

Can anyone explain to me the Lossy Counting algorithm? It is a streaming algorithm on finding frequency of items in a stream. Thanks.

推荐答案

假设你正在看的Facebook个人主页的流量。你有几十亿的点击率。你想找到它的配置文件是最经常访问的。你可以保持每个配置文件的计数,但你有一个非常大的数字计数跟踪,其中,绝大多数是没有意义的。

Say you're looking at the traffic for facebook profiles. You have billions of hits. You want to find which profiles are accessed the most often. You could keep a count for each profile, but then you'd have a very large number of counts to keep track of, the vast majority of which would be meaningless.

使用有损计数,你定期删除从表中非常低的计数的元素。最频繁访问的个人资料会几乎从来没有低数无论如何,如果他们这么做,他们就不太可能呆在那里太久。

With lossy counting, you periodically remove very low count elements from the table. The most-frequently accessed profiles would almost never have low counts anyway, and if they did, they wouldn't be likely to stay there for long.

该算法基本上包括输入分组成块或组块,每个组块内计数。然后,你个减少计数为每个元素,下降的数量降为零的任何元素。

The algorithm basically involves grouping the inputs into blocks or chunks and counting within each chunk. Then you reduce the count for each element by one, dropping any elements whose counts drop to zero.

最频繁袭击的配置文件会得到你的计数,呆在那里。未打很多时候任何配置文件将下降到零的几个街区,你会不会有更多的跟踪。

The most-frequently hit profiles will get on your count and stay there. Any profiles that aren't hit very often will drop to zero in a few blocks and you won't have to track them any more.

请注意,最终的结果是为了依赖,去年处理的数量给予较重。在某些情况下,这是非常合情合理的,是积极的一面,而不是缺点。 (如果您想了解哪些基本轮廓是最流行的现在的,要权衡访问今天比访问的最后一个月。)

Note that the final results are order-dependent, giving heavier weight to the counts processed last. In some cases, this makes perfect sense and is an upside rather than a downside. (If you want to know basically which profiles are the most popular now, you want to weigh accesses today more than accesses last month.)

有大量精炼的算法。但基本的想法是这样的 - 寻找重量级,而不必跟踪每一个元素,定期清除你的,似乎没有可能是基于数据,到目前为止重量级的所有元素的计数

There are a large number of refinements to the algorithm. But the basic idea is this -- to find the heavy hitters without having to track every element, periodically purge your counts of any elements that don't seem likely to be heavy hitters based on the data so far.

这篇关于什么是有损计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆