Rails 3 用户匹配算法到 SQL 查询(复杂) [英] Rails 3 user matching-algorithm to SQL Query (COMPLICATED)

查看:45
本文介绍了Rails 3 用户匹配算法到 SQL 查询(复杂)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发一款应用,该应用可根据回答的问题匹配用户.我在普通的 RoR 和 ActiveRecord 查询中实现了我的算法,但是使用它很慢.将一个用户与其他 100 个用户匹配需要

I'm currently working on an app that matches users based on answered questions. I realized my algorithm in normal RoR and ActiveRecord queries but it's waaay to slow to use it. To match one user with 100 other users takes

Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)

在我的本地机器上.但是还是...我现在想在原始 SQL 中实现这一点,以获得更高的性能.但是我真的很难理解 SQL 查询中的 SQL 查询以及诸如此类的东西以及计算等.我的脑袋快要爆炸了,我什至不知道从哪里开始.

on my local machine. But still... I now want to realize this in raw SQL in order to gain some more performance. But I'm really having trouble getting my head around SQL queries inside of SQL queries and stuff like this plus calculations etc. My head is about to explode and I don't even know where to start.

这是我的算法:

def match(user)
  @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
  @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100

  if self.common_questions(user) == []
    0.to_f
  else
    match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
    if match <= 0
      0.to_f
    else
      match
    end
  end
end

def possible_score(user)
  i = 0
  self.user_questions.select("question_id, importance").find_each do |n|
    if user.user_questions.select(:id).find_by_question_id(n.question_id)
      i += Importance.find_by_id(n.importance).value
    end
  end
  return i
end

def actual_score(user)
  i = 0
  self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
    @user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
    unless @user_answer == nil
      if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id)
        i += Importance.find_by_id(n.importance).value
      end
    end
  end
  return i
end

所以基本上用户回答一个问题,选择他接受的答案以及这个问题对他来说有多重要.该算法然后检查用户 2 有哪些共同问题,如果用户 1 给出了用户 2 接受的答案,如果是,则添加用户 2 为每个问题给出的重要性,这构成用户 1 的分数.对于 user2 也是相反的.除以可能的分数给出百分比,应用于几何平均值的两个百分比为我提供了两个用户的总匹配百分比.我知道相当复杂.告诉我是否解释得不够好.我只是希望我可以用原始 SQL 来表达这一点.性能就是一切.

So basically a user answers a questions, picks what answers he accepts and how important that question is to him. The algorithm then checks what questions 2 users have in common, if user1 gave an answer user2 accepts, if yes then the importance user2 gave for each question is added which makes up the score user1 made. Also the other way around for user2. Divided by the possible score gives the percentage and both percentages applied to the geometric mean gives me one total match percentage for both users. Fairly complicated I know. Tell if I didn't explain it good enough. I just hope I can express this in raw SQL. Performance is everything in this.

这是我的数据库表:

CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)

CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);

CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);

我猜 SQL 查询的顶部必须看起来像这样?

I guess the top of the SQL query has to look something like this?

SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM 

但由于我不是 SQL 高手,只能做通常的事情,我的脑袋快要爆炸了.我希望有人能帮我解决这个问题.或者至少以某种方式提高我的表现!非常感谢!

But since I'm not an SQL master and can only do the usual stuff my head is about to explode. I hope someone can help me figure this out. Or atleast improve my performance somehow! Thanks so much!

因此根据向导的回答,我设法为possible_score"获得了一个不错的 SQL 语句

So based on Wizard's answer I've managed to get a nice SQL statement for "possible_score"

SELECT SUM(value) AS sum_id 
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1

我试图用这个来获得actual_score",但没有用.执行此操作时,我的数据库管理器崩溃了.

I've tried to get the "actual_score" with this but it didn't work. My database manager crashed when I executed this.

SELECT SUM(imp.value) AS sum_id 
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id =  uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1

编辑 2

好吧,我是个白痴!我当然不能使用as"作为别名.将其更改为 aa 并且它起作用了!W00T!

EDIT2

Okay I'm an idiot! I can't use "as" as an alias of course. Changed it to aa and it worked! W00T!

推荐答案

我知道您正在考虑迁移到 SQL 解决方案,但是可以对您的 Ruby 代码进行一些重大的性能改进,这可能会消除需要使用手工编码的 SQL.在优化您的代码时,通常值得使用分析器来确保您确实知道哪些部分是问题所在.在您的示例中,我认为可以通过删除在每次迭代期间执行的迭代代码和数据库查询来进行一些重大改进!

I know you were thinking about moving to a SQL solution, but there are some major performance improvements which can be made to your Ruby code which might eliminate the need to use hand-coded SQL. When optimizing your code it is often worth using a profiler to make sure you really know which parts are the problem. In your example I think some big improvements can be made by removing iterative code and database queries which are executed during each iteration!

此外,如果您使用的是最新版本的 ActiveRecord,您可以使用子选择生成查询,而无需编写任何 SQL.当然,为数据库创建适当的索引很重要.

Also, if you are using a recent version of ActiveRecord you can generate queries with subselects without the need to code any SQL. Of course it is important that you have proper indexes created for your database.

根据我从您的代码中推断出的内容,我对您的模型和关系做出了很多假设.如果我错了,请告诉我,我会尝试做出相应的调整.

I'm making a lot of assumptions about your models and relationships based on what I can infer from your code. If I'm wrong let me know and I'll try to make some adjustments accordingly.

def match(user)    
  if self.common_questions(user) == []
    0.to_f
  else
    # Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed.
    @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
    @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
    match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
    if match <= 0
      0.to_f
    else
      match
    end
  end
end

def possible_score(user)
  # If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance.
  #   I.e. UserQuestion belongs_to :importance, and Importance has_many :user_questions.
  # I'm assuming that user_questions represents join models between users and questions.  
  #   I.e. User has_many :user_questions, and User has_many :questions, :through => :user_questions.  
  #        Question has_many :user_questions, and Question has_many :users, :through => :user_questions
  # From your code this seems like the logical setup.  Let me know if my assumption is wrong.

  self.user_questions.
    joins(:importance).                                             # Requires the relation between UserQuestion and Importance I described above
    where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord
    sum(:value)                                                     # I'm also assuming that the importances table has a `value` column.
end

def actual_score(user)
  user_questions.
    joins(:importance, :accepted_answers).  # It looks like accepted_answers indicates an answers table
    where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)).
    sum(:value)
end

UserQuestion 似乎是 User、Question、Answer 和 Importance 之间的超级连接模型.以下是与代码相关的模型关系(不包括您可以创建的 has_many :through 关系).我想你可能已经有了这些:

UserQuestion seems to be a super join model between User, Question, Answer and Importance. Here are the model relations relevant to the code (not including the has_many :through relations you could create). I think you probably have these already:

# User
has_many :user_questions

# UserQuestion
belongs_to :user
belongs_to :question
belongs_to :importance, :foreign_key => :importance  # Maybe rename the column `importance` to `importance_id`
belongs_to :answer

# Question
has_many :user_questions

# Importance
has_many :user_questions

# Answer
has_many :user_questions

这篇关于Rails 3 用户匹配算法到 SQL 查询(复杂)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆