Rails、Heroku 和 Resque:长时间运行的后台作业优化 [英] Rails, Heroku, and Resque: Long Running Background Job Optimization

查看:40
本文介绍了Rails、Heroku 和 Resque:长时间运行的后台作业优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在构建一个火种风格的应用,允许用户喜欢"或不喜欢"事件.每个事件都有大约 100 个与之相关的关键字.当用户喜欢"或不喜欢"某个事件时,我们会将该事件的关键字与用户相关联.用户可以快速获取上千个关键词.

We're building a tinder style app that allows users to "like" or "dislike" events. Each event has about 100 keywords associated with it. When a user "likes" or "dislikes" and event, we associate that event's keywords with the user. Users can quickly get thousands of keywords.

我们使用直通表将用户和事件与关键字(event_keywords 和 user_keywords)相关联.直通表有一个附加列 relevance_score,它是一个浮点数(例如,如果关键字非常相关,则可以是 0.1,如果非常相关,则可以是 0.9).

We use through tables to associate users and events to keywords (event_keywords and user_keywords). The through table has an additional column relevance_score which is a float (e.g. a keyword can be 0.1 if it's very slightly relevant or 0.9 if it's very relevant).

我们的目标是根据用户的关键字向用户展示最相关的事件.所以 Events 有很多 event_rankings 属于一个用户.理论上,我们希望为每个用户对所有事件进行不同的排名.

Our goal is to show users the most relevant events, based on their keywords. So Events has many event_rankings which belong to a user. Theoretically we want to rank all the events differently for each user.

以下是模型:

用户.rb:

  has_many :user_keywords, :dependent => :destroy
  has_many :keywords, :through => :user_keywords
  has_many :event_rankings, :dependent => :destroy
  has_many :events, :through => :event_rankings

事件.rb

  has_many :event_keywords, :dependent => :destroy
  has_many :keywords, :through => :event_keywords
  has_many :event_rankings, :dependent => :destroy
  has_many :users, :through => :event_rankings

用户关键字.rb:

  belongs_to :user
  belongs_to :keyword

EventKeyword.rb:

EventKeyword.rb:

  belongs_to :keyword
  belongs_to :event

EventRanking.rb:

EventRanking.rb:

  belongs_to :user
  belongs_to :event

关键字.rb:

  has_many :event_keywords, :dependent => :destroy
  has_many :events, :through => :event_keywords
  has_many :user_keywords, :dependent => :destroy
  has_many :users, :through => :user_keywords

我们有一种方法可以根据特定用户的关键字计算事件与该特定用户的相关程度.这个方法运行得非常快,因为它只是数学.

We have a method that calculates how relevant an event is to that specific user based on their keywords. This method runs really quick since it's just math.

用户.rb:

def calculate_event_relevance(event_id)
  ## Step 1: Find which of the event keywords the user has 
  ## Step 2: Compare those keywords and do math to calculate a score 
  ## Step 3: Update the event_ranking for this user
end

每当用户喜欢"或不喜欢"一个事件时,就会创建一个后台作业:

Every time a user "likes" or "dislikes" an event, a background job is created:

重新计算相关事件.rb:

RecalculateRelevantEvents.rb:

def self.perform(event_id)
  ## Step 1: Find any events that that share keywords with Event.find(event_id)
  ## Step 2: calculate_event_relevance(event) for each event from above step
end

所以这里是过程的摘要:

So here's a summary of the process:

  1. 用户喜欢或不喜欢活动
  2. 创建后台作业,在步骤 1 中找到与事件相似的事件
  3. 根据用户的关键字重新计算每个类似事件

我正在努力想办法优化我的方法,因为它很快就会失控.普通用户每分钟将浏览大约 20 个事件.一个事件最多可以有 1000 个类似事件.每个事件都有大约 100 个关键字.

I'm trying to figure out ways to optimize my approach since it can quickly get out of hand. Average user will swipe through about 20 events per minute. An event can have up to 1000 similar events. And each event has around 100 keywords.

因此,按照我的方法,每次滑动,我需要遍历 1000 个事件,然后在每个事件中遍历 100 个关键字.每个用户每分钟发生 20 次这种情况.

So with my approach, per swipe, I need to loop through 1000 events, then loop through 100 keywords in each event. And this happens 20 times a minute per user.

我应该如何处理这个问题?

How should I approach this?

推荐答案

您是否需要计算每次滑动?你能不能去抖动它,并为用户重新计算不超过每 5 分钟一次?

do you have to calculate per swipe? could you debounce it, and recalculate for the user no more than once every 5 minutes?

这些数据不需要每秒更新 20 次才能有用,事实上,每秒更新的频率可能比有用的要多得多.

This data doesn't need to be updated 20 times a second to be useful, in fact, being updated every second is probably much more often than is useful.

使用 5 分钟的去抖动,您可以在同一时期内从每位用户 6,000 (20 * 60 * 5) 次重新计算到 1 次 - 节省相当大的费用.

With a 5 min debounce, you go from 6,000 (20 * 60 * 5) recalculations per user to 1 in the same period - pretty big savings.

如果可以,我还建议您使用 sidekiq,通过它的多线程处理,您将极大地增加并发作业的数量 - 我是它的忠实粉丝.

I would also recommend using sidekiq if you can, with its multithreaded processing you'll get a huge boost to the number of simultaneous jobs - I'm a big fan.

一旦你使用它,你可以尝试这样的 gem:https://github.com/hummingbird-me/sidekiq-debounce

And them once you are using it, you could try a gem like: https://github.com/hummingbird-me/sidekiq-debounce

...这提供了我建议的那种去抖动.

...that provides the kind of debounce I was suggesting.

这篇关于Rails、Heroku 和 Resque:长时间运行的后台作业优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆