SQL:返回具有计算列的用户表以获取匹配百分比? [英] SQL: return user table with calculated column for match percentage?

查看:90
本文介绍了SQL:返回具有计算列的用户表以获取匹配百分比?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个Web应用程序,该应用程序可以根据回答的问题来匹配用户。我已经在一个查询中实现了匹配算法,并对其进行了优化,到目前为止,它需要8.2毫秒才能计算出两个用户之间的匹配百分比。但是我的webapp必须获取用户列表,并遍历该列表以执行此查询。对于5000个用户,我的本地计算机花费了50秒的时间。是否可以将所有内容放在一个查询中,该查询返回带有user_id的一列和带有计算出的匹配项的一列?还是存储过程是一种选择?

I'm currently writing a webapp that matches users based on answered question. I've realized my matching algorithm in just one query and tuned it so far that it takes 8.2ms to calculate the match percentage between 2 users. But my webapp has to take a list of users and iterate through the list performing this query. For 5000 users it took 50sec on my local machine. Is it possible to put everything in one query that returns one column with the user_id and one column with the calculated match? Or is a stored procedure an option?

我目前正在使用MySQL,但愿意在需要时切换数据库。

I'm currently working with MySQL but willing to switch databases if needed.

对于那些对模式和数据,我创建了一个SQLFiddle: http://sqlfiddle.com/#!2/ 84233/1

For anyone interested in the schema and data, I've created a SQLFiddle: http://sqlfiddle.com/#!2/84233/1

和我的匹配查询:

SELECT COALESCE(SQRT( (100.0*as1.actual_score/ps1.possible_score) * (100.0*as2.actual_score/ps2.possible_score) ) - (100/ps1.commonquestions), 0) AS perc
  FROM (SELECT SUM(imp.value) AS actual_score 
      FROM user_questions AS uq1
      INNER JOIN importances imp ON imp.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
        AND (uq1.accans1 = uq2.answer_id 
          OR uq1.accans2 = uq2.answer_id
          OR uq1.accans3 = uq2.answer_id
          OR uq1.accans4 = uq2.answer_id)
      WHERE uq1.user_id = 1) AS as1, 
  (SELECT SUM(value) AS possible_score, COUNT(*) AS commonquestions
      FROM user_questions AS uq1
      INNER JOIN importances ON importances.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
      WHERE uq1.user_id = 1) AS ps1,
  (SELECT SUM(imp.value) AS actual_score 
      FROM user_questions AS uq1
      INNER JOIN importances imp ON imp.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 1
        AND (uq1.accans1 = uq2.answer_id 
          OR uq1.accans2 = uq2.answer_id
          OR uq1.accans3 = uq2.answer_id
          OR uq1.accans4 = uq2.answer_id)
      WHERE uq1.user_id = 101) AS as2, 
  (SELECT SUM(value) AS possible_score 
      FROM user_questions AS uq1
      INNER JOIN importances ON importances.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 1
      WHERE uq1.user_id = 101) AS ps2


推荐答案

我很无聊,因此:这是查询的重写版本-基于模式的PostgreSQL端口-一次计算所有用户配对的匹配项:

I was bored, so: Here's a rewritten version of your query - based on a PostgreSQL port of your schema - that calculates the matches for all user pairings at once:

http://sqlfiddle.com/#!12/ 30524/6

我已经检查过,它对用户对(1,5)产生的结果相同。

I've checked and it produces the same results for the user pair (1,5).

WITH
userids(uid) AS (
    select distinct user_id from user_questions
),
users(u1,u2) AS (
    SELECT u1.uid, u2.uid FROM userids u1 CROSS JOIN userids u2 WHERE u1 <> u2
),
scores AS (
        SELECT
            sum(CASE WHEN uq2.answer_id IN (uq1.accans1, uq1.accans2, uq1.accans3, uq1.accans4) THEN imp.value ELSE 0 END) AS actual_score,
            sum(imp.value) AS potential_score,
            count(1) AS common_questions,
            users.u1,
            users.u2
        FROM user_questions AS uq1
        INNER JOIN importances imp ON imp.id = uq1.importance
        INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id
        INNER JOIN users ON (uq1.user_id=users.u1 AND uq2.user_id=users.u2)
        GROUP BY u1, u2
),
score_pairs(u1,u2,u1_actual,u2_actual,u1_potential,u2_potential,common_questions) AS (
    SELECT s1.u1, s1.u2, s1.actual_score, s2.actual_score, s1.potential_score, s2.potential_score, s1.common_questions
    FROM scores s1 INNER JOIN scores s2 ON (s1.u1 = s2.u2 AND s1.u2 = s2.u1)
    WHERE s1.u1 < s1.u2
)
SELECT
    u1, u2, 
    COALESCE(SQRT( (100.0*u1_actual/u1_potential) * (100.0*u2_actual/u2_potential) ) - (100/common_questions), 0) AS "match"
FROM  score_pairs;

没有理由不能将其移植回MySQL,因为CTE只是为了提高可读性并且不会执行 FROM(选择...)所无法做的任何事情。没有具有递延条款的子句,也没有从多个CTE引用过CTE。您可能会有一个可怕的嵌套查询,但这只是格式化方面的挑战。

There's no reason you couldn't port this back to MySQL, as the CTE is only there for readability and doesn't do anything you can't do with FROM (SELECT ...). There's no WITH RECURSIVE clause and no CTE is referenced from more than one other CTE. You'd have a bit of a scary nested query, but that's just a formatting challenge.

更改:


  • 生成一组不同的用户

  • 该一组不同用户的自联接以创建一组用户配对

  • 然后在分数查询中加入该配对对列表,以生成分数表

  • 通过组合很大程度上重复的对hoscorescore1和possiblescore2,actualscore1和actualscore2的查询来生成分数表。 / li>
  • 然后在最终外部查询中对其进行汇总

  • Generate a set of distinct users
  • Self-join that set of distinct users to create a set of user pairings
  • and then join on that list of pairings in the score query to produce a table of scores
  • Produce the scores table by combining the largely duplicate queries for possiblescore1 and possiblescore2, actualscore1 and actualscore2.
  • then summarize it in the final outer query

我尚未优化查询;如所写,它在我的系统上运行5毫秒。在更大的数据上,可能您可能需要重组其中的某些数据或使用一些技巧,例如将某些CTE子句转换为 SELECT ... INTO TEMPORARY TABLE 临时表创建语句,然后对其进行索引在查询之前。

I haven't optimised the query; as written it runs in 5ms on my system. On bigger data it's possible you may need to restructure some of it or use tricks like converting some CTE clauses into SELECT ... INTO TEMPORARY TABLE temp table creation statements that you then index before querying.

也有可能要将个用户行集的生成移出CTE并进入分数 FROM 子查询子句。这是因为 WITH 必须充当子句之间的优化隔离,因此数据库必须具体化行,并且不能使用诸如向上或向下推送子句之类的技巧。

It's also possible that you'll want to move the generation of the users rowset out of the CTE and into a FROM subquery clause of scores. That's because WITH is required to behave as an optimisation fence between clauses, so the database must materialize rows and can't use tricks like pushing clauses up or down.

这篇关于SQL:返回具有计算列的用户表以获取匹配百分比?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆