每天,每周,每月和每年计算和存储平均数据 [英] Calculating and Storing Average data on a daily, weekly and monthly and yearly basis

查看:60
本文介绍了每天,每周,每月和每年计算和存储平均数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天我在SO上问了一个问题,标题为确定并实施趋势Django中的算法.很多人建议使用简单的方法,例如平均值(指数,加权等).我有一个名为Book的模型,另一个有名为Readers的模型:

I asked a question on SO yesterday titled Deciding and implementing a trending algorithm in Django. A lot of people suggested a simple thing like averages (exponential, weighted, etc.) I have a model called Book and another called Readers:

class Book(models.Model):
    name = models.charField()

class Reader(models.Model):
    date = models.DateField()
    book = models.ForeignKey(Book)
    reader_count = models.PostiveIntegerField()

一个简单的结构.每天都会增加新书,每天也会增加每本书的读者数.也就是说,一本书每天的读者人数为多条记录.

A simple structure. New books are added every day and the reader count for each book is added every day too. i.e. a book will have a reader count for the day for each day, multiple records.

我需要计算当周,当月和当年的图书平均值.除了当前数据外,我还要保留历史数据.

I need to calculated the averages for the Books for the current week, the current month and the current year. Apart from the current data, I'd like to preserve the historical data too.

如果我尝试从数据库中查询此类数据,那么它将大受打击.不是吗此外,我试图从一开始就使用简单的平均值来实现该系统,但是稍后,我希望能够灵活地更改自己的计算方法.我有两个选择-

If i tried querying this kind of data from the DB, it would take a big hit. Wouldn't it. Besides, I'm trying to implement this system using simple averages to begin with but later on, I'd like to have the flexibility of changing my computational method. I have two options -

  • 一个,每次添加新的 Reader 记录时,我都可以在其中部分更新另一个表中的数据,该表存储计算出的数据.

  • one, where I can partially update the data in another table which stores the computed data every time a new Reader record is added.

两个,我每天晚上都可以通过脚本来重建当前日期/星期/月份的聚合数据.

two, where I could rebuild the aggregated data every night through a script for the current day/week/month.

这是一些示例数据和结果.

Here's some sample data and the results.

Book  Date        Count
----  ----------  -----
AAAA  01.01.2012    10
AAAA  02.01.2012    20
AAAA  03.01.2012    30
AAAA  04.01.2012    30
AAAA  05.01.2012    40
AAAA  06.01.2012    10
AAAA  07.01.2012    25
AAAA  08.01.2012    15
AAAA  09.01.2012    10

第一周的读者平均数是:23.5.第2周(本例为当前周)的读者计数平均值是:12.5..并且当前月份和年份将为21.1

The reader count average for Week #1 is: 23.5. The reader count average for Week #2 (which isn this case would be the current week) is: 12.5 ..and for the current month and year would be 21.1

HTH.

为了给所有这些镜头一个镜头,我想构建一个存储数据的系统.我需要每天,每周和每月存储平均值.但是我对应该实现哪种表结构感到迷茫?我想尽可能不要重新发明轮子,因此,如果你们中的任何人知道任何允许我完成此任务的软件包,那就太好了.

In order to give any of these a shot, I'd like to build a system to store the data. I need to store the averages on a daily, weekly, and monthly basis. However I'm very lost with what kind of a table structure I should implement? I'd like to not re-invent the wheel if possible so if any of you know about any packages that allow me to accomplish this, it would be great.

谢谢.

推荐答案

Postgres非常擅长与其他流量同时进行此类计算,因此不必担心负载(只要您在请求-响应周期之外运行这种批处理作业.

Postgres is very good at doing these kinds of calculations at the same time as your other traffic, so don't worry too much about load (as long as you run this kind of batch job outside of the request-response cycle).

您可能要看的一件事是将这种工作分成可缓存的小单元.也就是说,每个月的平均数实际上是过去4周的平均数,一年的平均数是过去12个月的平均数,而且无论如何都只能以每本书为基础进行计算,所以为什么不做一小部分要求的工作.

One thing that you may look at doing is splitting this kind of work into small cache-able units. i.e. each average of a month is really an average of the past 4 weeks, each average of a year is an average of the last 12 months, and this is all only done on a per-book basis anyway, so why not do small subsets of the work in-request.

from django.core.cache import cache
from datetime import timedelta

def cached(key, expire)
    def wrapped(f):
        def func(*args, **kwargs):
            result = cache.get(key%args%kwargs)
            if result is None:
                result = f(*args, **kwargs)
                cache.set(key%args%kwargs, result, expire)
            return result
        return func
    return wrapped

@cached("book:%s:avg:week:%s", 3600*24) #cache for a day, rolling results!
def book_read_week_average(book_id, week_start):
    week_end = week_start + timedelta(days=7)
    return Reader.objects.filter(book_id=book_id, date_gte=week_start, date_lt=week_end) \
                         .aggregate(Avg('count'))['count_avg']

@cached("book:%s:avg:month:%s", 3600*24) #cache for a day for rolling results
def book_read_month_average(book_id, month_start):
    month_end = month_start + timedelta(days=31)
    return Reader.objects.filter(book_id=book_id, date_gte=month_start, date_lt=month_end) \
                         .aggregate(Avg('count'))['count_avg']

@cached("author:%s:avg:month:%s", 3600*24)
def author_read_month_average(author_id, month_start):
    return sum(book_read_month_average( book.id )
               for book in Book.objects.filter(author_id=author_id) )

使用函数组合和缓存的函数,仅在需要时才生成所需的数据.您还可以将此信息存储在redis中而不是django缓存中,并利用读取计数的原子增量,允许实时读取统计信息.

using function composition and cached functions, you generate only the data you need, and only when you need it. You can also store this info in redis instead of the django cache and take advantage of atomic increments on read counts, allow real-time reading statistics.

这篇关于每天,每周,每月和每年计算和存储平均数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆