在Django中决定和实现一个趋势算法 [英] Deciding and implementing a trending algorithm in Django

查看:275
本文介绍了在Django中决定和实现一个趋势算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Django应用程序,我需要实现一个简单的趋势/排名算法。我非常迷失为:



我有两个模型,阅读器。每天晚上,新书被添加到我的数据库。每本书的阅读器数量也会每天更新一次,即一本书将有多个读者统计记录(每天一个记录)。



在给定的时间段内周,过去一个月或过去一年),我想列出最流行的书籍,我应该使用什么算法?



人气不需要实际上,因为每本书的阅读器数量只能每天更新。



我发现一篇文章被另一个SO引用显示他们如何计算维基百科文章,但是帖子只显示了当前趋势的计算方式。



正如有人在SO上指出的那样,这是一个非常简单的基线趋势算法,只计算两个数据点之间的斜率我想这显示了昨天和今天之间的趋势。



我没看对于像Hacker News,Reddit等使用的复杂趋势算法一样。



我只有两个数据轴,读者数和日期。



任何关于我应该如何实现的想法。对于从未与任何统计/算法相关的人,这似乎是一项非常艰巨的任务。



提前感谢大家。

解决方案

我可以想到的最简单的趋势算法可能是n天移动平均线。我不知道你的数据是如何结构化的,但是你可以这样说:

  books = {'Twilight' [500,555,580,577,523,533,556,593],
'Harry Potter':[650,647,663,642,633,621,625,613],
'计算机程序的结构和解释:[1,4,15,12,7,3,8,19]
}

一个简单的移动平均值只需要最后一个 n 值,并对它们进行平均:

  def moving_av(l,n):
取一个列表l,返回其最后n个元素的平均值

观察= len(l [-n:])
返回总和(l [-n:])/ float(观察)

切片符号简单地从第n个到最后一个变量抓取列表的尾部。移动平均线是平滑任何可能引入的穗状或倾角的噪声的相当标准的方法。该功能可以这样使用:

  book_scores = {} 
为book,reader_list在books.iteritems() :
book_scores [book] = moving_av(reader_list,5)

玩你平均的天数。如果您想强调最近的趋势,您还可以使用加权移动平均线



如果你想集中精力放在绝对读者群中,而不是增加读者群,那么只需找出30天移动平均线的百分比变化和5天移动均线:

  d5_moving_av = moving_av(reader_list,5)
d30_moving_av = moving_av(reader_list,30 )
book_score =(d5_moving_av - d30_moving_av)/ d30_moving_av

使用这些简单的工具,相当多的灵活性,你多强调过去的趋势,以及你想平滑多少(或不平滑)尖峰。


I have a Django application in which I need to implement a simple trending/ranking algorithm. I'm very lost as a :

I have two models, Book and Reader. Every night, new books are added to my database. The number of readers for each book are updated too every night i.e. One book will have multiple reader statistic records (one record for each day).

Over a given period (past week, past month or past year), I would like to list the most popular books, what algorithm should I use for this?

The popularity doesn't need to be realtime in any way because the reader count for each book is only updated daily.

I found one article which was referenced in another SO post that showed how they calculated trending Wikipedia articles but the post only showed how the current trend was calculated.

As someone pointed out on SO, it is a very simple baseline trend algorithm and only calculates the slope between two data points so I guess it shows the trend between yesterday and today.

I'm not looking for a uber complex trending algorithm like those used on Hacker News, Reddit, etc.

I have only two data axes, reader count and date.

Any ideas on what and how I should implement. For someone who's never worked with anything statistics/algorithm related, this seems to be a very daunting undertaking.

Thanks in advance everyone.

解决方案

Probably the simplest possible trending "algorithm" I can think of is the n-day moving average. I'm not sure how your data is structured, but say you have something like this:

books = {'Twilight': [500, 555, 580, 577, 523, 533, 556, 593],
         'Harry Potter': [650, 647, 653, 642, 633, 621, 625, 613],
         'Structure and Interpretation of Computer Programs': [1, 4, 15, 12, 7, 3, 8, 19]
        }

A simple moving average just takes the last n values and averages them:

def moving_av(l, n):
    """Take a list, l, and return the average of its last n elements.
    """
    observations = len(l[-n:])
    return sum(l[-n:]) / float(observations)

The slice notation simply grabs the tail end of the list, starting from the nth to last variable. A moving average is a fairly standard way to smooth out any noise that a single spike or dip could introduce. The function could be used like so:

book_scores = {}
for book, reader_list in books.iteritems():
    book_scores[book] = moving_av(reader_list, 5)

You'll want to play around with the number of days you average over. And if you want to emphasize recent trends you can also look at using something like a weighted moving average.

If you wanted to focus on something that looks less at absolute readership and focuses instead on increases in readership, simply find the percent change in the 30-day moving average and 5-day moving average:

d5_moving_av = moving_av(reader_list, 5)
d30_moving_av = moving_av(reader_list, 30)
book_score = (d5_moving_av - d30_moving_av) / d30_moving_av

With these simple tools you have a fair amount of flexibility in how much you emphasize past trends and how much you want to smooth out (or not smooth out) spikes.

这篇关于在Django中决定和实现一个趋势算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆