提前查询.在mysql中对最相关的字段进行排名 [英] Advance query. Rank most related fields in mysql

查看:134
本文介绍了提前查询.在mysql中对最相关的字段进行排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个这样的数据库:

Let's assume we have a database like this:

Project_tbl:


-----------------
id | Project_name
-----------------
1  | A 
2  | B
3  | C
-----------------


personel_project_tbl:


--------------------
user_id | Project_id
--------------------
1       | 1
2       | 2
3       | 1
3       | 2
2       | 3
--------------------

instrument_project_tbl:


--------------------------
instrument_id | Project_id
--------------------------
1             | 1
1             | 2
2             | 2
2             | 1
1             | 3
--------------------------


现在,我需要对项目列表进行排序,并根据它们与项目A的相似性对其进行排名.


Now, I need to sort the list of projects and rank them with regard to their similarity to the project A.

例如:

A和B在3个用户中共有1个用户,在2个工具中有2个用户,因此相似度排名为(1/2 + 2/2)/2 = 75%

A and B have 1 users in common over the 3 users and 2 instruments over the 2 instrument so their similarity ranking is (1/2 + 2/2) / 2 = 75%

A和C没有共同的用户,但是拥有2台以上的仪器中的1台,因此它是(1/2)/2 = 25%

A and C have no user in common but have 1 over 2 instruments so it will be (1/2)/2 = 25%

所以B比be更相似,输出应该是

So B is more similar than be and output should be


--------------
Project | Rank
--------------
2       | 75
3       | 25

这是我想到的第一个解决方案...
如果我在PHP和MySQL中做到这一点,那将是类似的事情:

That's the first solution came to my mind...
If I did it in PHP and MySQL, it would be something like:

 for all tables as table_x
    for all projects (except A) as prj_y
         unique = (Select distinct count(items) from table_x where project is A)
         count += (Select distinct count(items) from table_x
                   where project is prj_x and items are in
                     (select distinct items from table_x where project is a)
                  )/unique

因此,复杂度将为O( n 2 ),并且使用索引编制索引,选择也将花费O(log n ),负担不起.

So the complexity would be O(n2) and with indexing the select also would cost O(log n) which wouldn't be affordable.

您是否有想法完全在MySQL中完成或以更好,更快的方式完成它?

Do you have any idea to do it totally in MySQL or do it in a better and faster way?

********更多信息和注释:**

******** More information and notes:**

  1. 我仅限于PHP和MySQL.

  1. I'm limited to PHP and MySQL.

这只是一个例子,在我的真实项目中,表有20多个表,因此该解决方案应具有高性能.

This is just an example, in my real project the tables are more than 20 tables so the solution should have high performance.

此问题是此问题的补充问题:

this question is the supplementary question for this one : Get the most repeated similar fields in MySQL database if yr solution can be used or applied in a way for both of them (somehow) It would be more than great. I want to multiply the value of related projects with the similarity of items to get the best option...

总而言之,这两个问题将是:获得最相关的项目,获得所有项目的相似项目,并找到当前项目中最相似的项目,其中该项目也与当前项目相似!哟

In conclusion, these two questions will : get the most related projects, get the similar items of all projects and find the most similar item for current project where the project is also similar to the current one! yo

感谢您的明智回答,如果您能对情况有所了解,我们将不胜感激

推荐答案

您可以 :

SET @Aid = (SELECT id
            FROM Project_tbl
            WHERE Project_name = 'A');

SELECT P.id
  , (IFNULL(personel.prop, 0) +
     IFNULL(instrument.prop, 0)
    )/2*100 Rank
  , personel.prop AS personell
  , instrument.prop AS instrument
FROM Project_tbl P
LEFT JOIN
  ( SELECT B.Project_id pid, COUNT(*)/C.ref prop
    FROM personel_project_tbl A,
         personel_project_tbl B,
         (SELECT COUNT(*) AS ref
          FROM personel_project_tbl
          WHERE Project_id = @Aid
         ) AS C
    WHERE A.user_id = B.user_id
    AND A.Project_id = @Aid
    GROUP BY B.Project_id
  ) personel ON P.id = personel.pid
LEFT JOIN
  ( SELECT B.Project_id pid, COUNT(*)/C.ref prop
    FROM instrument_project_tbl A,
        instrument_project_tbl B,
         (SELECT COUNT(*) AS ref
          FROM instrument_project_tbl
          WHERE Project_id = @Aid
         ) AS C
    WHERE A.instrument_id = B.instrument_id
    AND A.Project_id = @Aid
    GROUP BY B.Project_id
  ) instrument ON P.id = instrument.pid
WHERE P.id <> @Aid
ORDER BY Rank DESC

想法是每个表都有一个子查询,并且这些子查询中的每一个将项目ID映射到给定表的对应率.

The idea is to have one subquery for each table, and each of these subqueries maps project id to correspondence ratio for a given table.

关于性能,我什么也没说.您必须尝试看看它是否足够快以满足您的需求,但是正如我所见,它无法克服您的O( n 2 )复杂性提及,因为您必须检查所有数据.

I'm saying nothing at all about performance. You'll have to try and see whether it is fast enough for your needs, but as I see it there is no way to beat the O(n2) complexity you mention, as you have to inspect all the data.

这篇关于提前查询.在mysql中对最相关的字段进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆