趋势分析算法 [英] Trending algorithm

查看:451
本文介绍了趋势分析算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在微论坛的排序,从而快速(接近鸣叫尺寸)的主题发布消息由一个特殊的用户,该用户可以用自己像大小的消息作出回应。简单易懂,没有'挖'或任何形式,为每个主题消息的反应仅仅是按时间顺序投票。但是,高流量的预期。

I'm working on a micro-forum of sorts, whereby a quick (close to tweet-size) topic message is posted by a special user, which subscribers can respond to with like-sized messages of their own. Straightforward, no 'digging' or voting of any sort, just a chronological flow of responses for each topic message. But with high traffic expected.

我们想旗主题的消息根据响应的嗡嗡声,他们atract,使用比例为0至10。

We would like to flag topic messages according to the response buzz they atract, using a scale of 0 to 10.

一直使用Google的潮流算法和开源社区的应用实例一会,至今已收集到了两个有趣的参考,我不完全神交尚未:

Been googling for trend algorithms and open source community application examples for a while, and so far have gleaned two interesting references, which I don't fully grok yet:

  • <一个href="http://stackoverflow.com/questions/1635703/understanding-algorithms-for-measuring-trends">Understanding算法,用于测量趋势,比较上使用Baseline趋势算法,这里SO维基百科浏览量的讨论。

  • Understanding algorithms for measuring trends, a discussion on comparing wikipedia pageviews using the Baseline Trend Algorithm, here on SO.

的布兰妮斯皮尔斯问题,在深入本文就如何排名的搜索条件,在处理大型数据流。

The Britney Spears Problem, an in-depth article on how to rank search terms, while processing large streams of data.

这是我第一次认识到需要检查的活动斜坡,并且两个项目有很大的不同的活动规模之间的平衡的重量。但我怎么比较多的项目,跨越时间的数量正在快速增长?然后,我该如何打破嗡嗡等级中的项从0到10?

From the first I understand the need to check the slope in activity, and to balance the weight between two items that differ greatly in scale of activity. But how do I compare many items, growing in number quickly across time? And then, how do I break the items within "buzz grades" from 0 to 10?

第二个参考是迷人的,但在我的脑袋在这一点上。从第一遍我已经明白需要保持内存使用稳定的同时保持计数器和存储引用的项目,如果有必要的。但我还没有想出一个合适的算法,我的具体使用情况下,从它呢。

The second reference is fascinating, but over my head at this point. From a first pass I've understood the need to keep memory usage stable while keeping counters and storing references to items if necessary. But I haven't figured a fitting algorithm for my specific use case from it, yet.

值得一提的是,我来自一个非计算机科学,绝对无统计学背景。请原谅我:)任何帮助和code样品(特别是在红宝石)将大大AP preciated。

It's worth noting that I come from a non-computer-science and definitely non-statistics background. Please bear with me :) Any help and code samples (specially in Ruby) would be greatly appreciated.

推荐答案

直觉说,解决这个问题并不需要大量的统计数据,通过排名基于一些简单的措施的主题可能已经为您提供了一个有趣的选择热门话题。

Intuition says that a solution to this problem doesn't need a lot of statistics, by ranking the topics based on some simple measures may already provide you with an interesting selection of "trending topics."

一种方法是订购主题由数量的意见在最后一小时/天/周产生......并选择顶部的。

One way is to order the topics by number comments generated in the last hour/day/week... and to select the top ones.

另一种方法是计算了每个主题的评论的数量,并通过专题的年龄分裂这一点。新的主题,立即生成的评论将被认为是趋势,而随着年龄的增长较老的话题有许多评论将减少趋势。

Another way is to count the number of comments for each of the topics and divide this by the "age" of the topic. New topics that immediately generate comments will be considered trending, while older topics with many comments will be less trending as they grow older.

这些实现可以很容易地用Ruby / Rails的创建,甚至可以在SQL查询中完成的,但前提是表中包含发布日期的意见和数字。

These implementations can easily be created in Ruby/Rails and can even be done in an SQL query, provided that the tables contain publish dates and numbers of comments.

这篇关于趋势分析算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆