构建一个个性化的Facebook的新闻源:SQL,MongoDB? [英] Constructing a personalized Facebook-like Newsfeed: SQL, MongoDB?

查看:107
本文介绍了构建一个个性化的Facebook的新闻源:SQL,MongoDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个类似Facebook的新闻资讯。这意味着它是从许多SQL表构建的,每个数据类型都有一个特定的布局。



这是我现在做的:



用户模型

  def updates(more_options = {})
(games_around({},more_options)+ friends_statuses({},more_options).sort!{| a,b | b.updated_at< = a.updated_at} .slice(0,35)+ friends_stats :limit => 10},more_options)+ friends_badges({:limit => 3},more_options))。 {| a,b | b.updated_at = a.updated_at}
end

徽章数据示例:

  def friends_badges(options = {:limit => 3},more_options = {})
rewards = []
rewards = Reward.find(:all,options.merge!(:condition => [rewards.user_id IN(?),self.players_around({},more_options).collect {| p | p.id}],:joins => [:user,:badge],:order =>rewards.created_at DESC))
rewards.flatten
end



新闻源视图

 <%for update in @ current_user.updates%> 
<%if update.class.name ==Status%>
<%@status = update%>
<%= render:partial => users / statuses / status_line,:locals => {:status => update}%>
<%elsif update.class.name ==Game%>
<%= render:partial => games / game_newsfeed_line,:locals => {:game => update}%>
<%elsif update.class.name ==Stat%>
<%= render:partial => stats / stat_newsfeed_line,:locals => {:stat => update}%>
<%elsif update.class.name ==Reward%>
<%= render:partial => badges / badge_newsfeed_line,:locals => {:reward => update}%>
<%end%>
<%end%>

我考虑的选项:




  • 构建Feed表,并为每个具有后台作业的用户预处理大部分更新。最有可能是每小时的cron。

  • 保留初始结构,但分别缓存每个更新(现在我没有缓存)

  • 切换到MongoDB以更快地访问数据库



我不得不说,我不是一个专家,Rails使第一步容易,但现在每页加载超过150个SQL请求,我觉得它失去控制,需要一个专家的观点...





感谢您的宝贵帮助,



解决方案

您的代码不会告诉我很多,我认为如果你可以布局你的数据结构在纯JSON / SQL是有用的。



无论如何,我将序列化每个用户的流到MongoDB。我不会将HTML存储在数据库中的各种原因(至少不是在该级别的软件);相反,您应该将相关数据保存在(可能是多态)集合中。获取新闻源是非常容易的,索引是直接的,等等。视图结构基本上不会改变。如果您以后要更改HTML,这也很容易。



缺点是,这将复制很多数据。如果人们可以有很多追随者,这可能会成为一个问题。使用用户ID数组而不是单个用户ID可能会有帮助(如果所有关注者的信息相同),但它也受到限制。



对于非常大的关联问题,只有缓存。我的理解方式,在facebook和twitter的魔法是,他们不经常db,并保持大量的数据在RAM中。如果您关联数十亿项内容,即使在内存中也是如此。



更新应连续写入,而不是每小时一次。假设你有很多流量,每小时更新需要30分钟。现在,最坏的情况是90分钟。延迟。



你必须在某个时候抛出假设,使用缓存和一些启发式。一些示例:




  • 推文越近,流量就越多。它有更高的被转播的机会,它被更多地看到。

  • 您的Facebook时间轴概述页面(1991年)可能不会每天更改,因此这是长期输出缓存的候选。
  • 目前的Facebook活动很可能会经历大量的写作。输出缓存在这里不会有什么帮助。再次,对象应保存在RAM中。


I'm building a Facebook-like newsfeed. Meaning it's being built from many SQL tables and each data-type has a specific layout. But's it's becoming very heavy to load and I was hoping to make it even more complex...

Here's what I do now:

User model:

  def updates(more_options = {})
        (games_around({},more_options) + friends_statuses({},more_options).sort! { |a,b| b.updated_at <=> a.updated_at }.slice(0,35) + friends_stats({:limit  => 10},more_options) + friends_badges({:limit  => 3},more_options)).sort! { |a,b| b.updated_at <=> a.updated_at }
  end

Example for the Badges data:

  def friends_badges(options = {:limit  => 3}, more_options = {})
    rewards = []
      rewards = Reward.find(:all, options.merge!(:conditions  => ["rewards.user_id IN (?)",self.players_around({},more_options).collect{|p| p.id}], :joins  => [:user, :badge], :order  => "rewards.created_at DESC"))            
    rewards.flatten
  end

Newsfeed View:

<% for update in @current_user.updates %>
        <% if update.class.name == "Status" %>
            <% @status = update %>
            <%= render :partial  => "users/statuses/status_line", :locals  => {:status  => update} %>
        <% elsif update.class.name == "Game" %>
            <%= render :partial => "games/game_newsfeed_line", :locals  => {:game  => update} %>
        <% elsif update.class.name == "Stat" %>
            <%= render :partial => "stats/stat_newsfeed_line", :locals  => {:stat  => update} %>
        <% elsif update.class.name == "Reward" %>
            <%= render :partial => "badges/badge_newsfeed_line", :locals  => {:reward  => update} %>
        <% end %>
    <% end %>

The options I thought about:

  • Building a "Feed" table and preprocess most of the updates for each user with a background job. Most likely an hourly cron. I would store the entire HTML code for each update.
  • Keep the initial structure but work on caching each update separately (right now I have no caching)
  • Switch to MongoDB to get a faster access to the database

I have to say, I'm not really an expert, Rails made the first steps easy but now with more than 150 SQL requests per page loaded I feel it's out of control and requires an expert point of view...

What would you do?

Thanks for your precious help,

解决方案

Your code doesn't tell me a lot; I think it'd be helpful if you could lay out your data structure in plain JSON / SQL.

Anyway, I'd serialize each user's stream to MongoDB. I wouldn't store the HTML in the database for various reasons (at least not at that level of the software); instead, you should save the relevant data in a (possibly polymorphic) collection. Fetching the newsfeed is very easy then, indexation is straightforward, etc. The view structure would essentially not change. If you later want to change the HTML, that is easy as well.

The downside is that this will duplicate a lot of data. If people can have lots of followers, this may become a problem. Using arrays of user ids instead of a single user id might help (if the information is the same for all followers), but it's also limited.

For very large association problems, there is only caching. The way I understand it, the magic in both facebook and twitter is that they don't hit the db very often and keep a lot of data in RAM. If you're associating billions of items, doing that is a challenge even in RAM.

The updates should be written continously rather than on an hourly basis. Suppose you have a lot of traffic, and the hourly update takes 30min. Now, the worst case is a 90 min. delay. If you process changes just-in-time, you can cut this to probably 5 min.

You'll have to throw in assumptions at some point, use caching and some heuristics. Some examples:

  • The more recent a tweet, the more traffic it will see. It has a higher chance of being retweeted, and it's seen much more often. Keep it in RAM.
  • Your facebook timeline overview page of 1991 is probably not going to change on a daily basis, so this is a candidate for long-term output caching.
  • Current facebook activity is likely to undergo lot's of writes. Output caching won't help much here. Again, the object should be kept in RAM.

这篇关于构建一个个性化的Facebook的新闻源:SQL,MongoDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆