数据结构/算法,有效地节省加权移动平均线 [英] Data structure/algorithm to efficiently save weighted moving average

查看:138
本文介绍了数据结构/算法,有效地节省加权移动平均线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想总结一下存储日志记录时移动平均多个不同的类别。试想一下,节省了网络服务器日志一次一个条目的服务。让我们进一步想象一下,我们没有访问登录记录。所以我们看到他们一次,但没有对它们的访问以后。

I'd like to sum up moving averages for a number of different categories when storing log records. Imagine a service that saves web server logs one entry at a time. Let's further imagine, we don't have access to the logged records. So we see them once but don't have access to them later on.

有关不同的页面,我想知道

For different pages, I'd like to know

  • 命中总数(方便)
  • 在一个最近的平均水平(如一个月或左右)
  • 在一个长期的平均水平(超过一年)

有没有什么聪明的算法/数据模型,可以节省这样的移动平均线,而无需通过总结数据了庞大的批量重新计算?

Is there any clever algorithm/data model that allows to save such moving averages without having to recalculate them by summing up huge quantities of data?

我不需要精确的平均值(正好是30天左右),但只是趋势指标。因此,一些模糊性是没有问题的。它应该只是确保新的条目比年长的权重较高。

I don't need an exact average (exactly 30 days or so) but just trend indicators. So some fuzziness is not a problem at all. It should just make sure that newer entries are weighted higher than older ones.

一个解决方案很可能是自动创建的统计记录每个月。不过,我也不需要近一个月的统计数据,因此这似乎有点小题大做。而且它不会给我一个移动平均值,而是从本月交换到新的价值观来月。

One solution probably would be to auto-create statistics records for each month. However, I don't even need past month statistics, so this seems like overkill. And it wouldn't give me a moving average but rather swap to new values from month to month.

推荐答案

这是简单的解决办法是保持一个呈指数衰减的总和。

An easy solution would be to keep an exponentially decaying total.

有可以使用下列公式计算:

It can be calculated using the following formula:

newX = oldX * (p ^ (newT - oldT)) + delta

其中, oldX 是你的总的旧值(时间 oldT ),下一页末是你的总的(时间纽特)的新值; 增量是新的事件总的贡献(例如命中今日数); P 小于或等于1并且是衰减因子。如果我们把 P = 1 ,那么我们的总命中数。通过降低 P ,我们有效地降低我们的总描述的时间间隔。

where oldX is the old value of your total (at time oldT), newX is the new value of your total (at time newT); delta is the contribution of new events to the total (for example the number of hits today); p is less or equal to 1 and is the decay factor. If we take p = 1, then we have the total number of hits. By decreasing p, we effectively decrease the interval our total describes.

这篇关于数据结构/算法,有效地节省加权移动平均线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆