提前查询.在mysql中对最相关的字段进行排名 [英] Advance query. Rank most related fields in mysql
问题描述
假设我们有一个这样的数据库:
Let's assume we have a database like this:
Project_tbl
:
-----------------
id | Project_name
-----------------
1 | A
2 | B
3 | C
-----------------
personel_project_tbl
:
--------------------
user_id | Project_id
--------------------
1 | 1
2 | 2
3 | 1
3 | 2
2 | 3
--------------------
instrument_project_tbl
:
--------------------------
instrument_id | Project_id
--------------------------
1 | 1
1 | 2
2 | 2
2 | 1
1 | 3
--------------------------
现在,我需要对项目列表进行排序,并根据它们与项目A的相似性对其进行排名.
Now, I need to sort the list of projects and rank them with regard to their similarity to the project A.
例如:
A和B在3个用户中共有1个用户,在2个工具中有2个用户,因此相似度排名为(1/2 + 2/2)/2 = 75%
A and B have 1 users in common over the 3 users and 2 instruments over the 2 instrument so their similarity ranking is (1/2 + 2/2) / 2 = 75%
A和C没有共同的用户,但是拥有2台以上的仪器中的1台,因此它是(1/2)/2 = 25%
A and C have no user in common but have 1 over 2 instruments so it will be (1/2)/2 = 25%
所以B比be更相似,输出应该是
So B is more similar than be and output should be
--------------
Project | Rank
--------------
2 | 75
3 | 25
这是我想到的第一个解决方案...
如果我在PHP和MySQL中做到这一点,那将是类似的事情:
That's the first solution came to my mind...
If I did it in PHP and MySQL, it would be something like:
for all tables as table_x
for all projects (except A) as prj_y
unique = (Select distinct count(items) from table_x where project is A)
count += (Select distinct count(items) from table_x
where project is prj_x and items are in
(select distinct items from table_x where project is a)
)/unique
因此,复杂度将为O( n 2 ),并且使用索引编制索引,选择也将花费O(log n ),负担不起.
So the complexity would be O(n2) and with indexing the select also would cost O(log n) which wouldn't be affordable.
您是否有想法完全在MySQL中完成或以更好,更快的方式完成它?
Do you have any idea to do it totally in MySQL or do it in a better and faster way?
********更多信息和注释:**
******** More information and notes:**
-
我仅限于PHP和MySQL.
I'm limited to PHP and MySQL.
这只是一个例子,在我的真实项目中,表有20多个表,因此该解决方案应具有高性能.
This is just an example, in my real project the tables are more than 20 tables so the solution should have high performance.
this question is the supplementary question for this one : Get the most repeated similar fields in MySQL database if yr solution can be used or applied in a way for both of them (somehow) It would be more than great. I want to multiply the value of related projects with the similarity of items to get the best option...
总而言之,这两个问题将是:获得最相关的项目,获得所有项目的相似项目,并找到当前项目中最相似的项目,其中该项目也与当前项目相似!哟
In conclusion, these two questions will : get the most related projects, get the similar items of all projects and find the most similar item for current project where the project is also similar to the current one! yo
感谢您的明智回答,如果您能对情况有所了解,我们将不胜感激
推荐答案
SET @Aid = (SELECT id
FROM Project_tbl
WHERE Project_name = 'A');
SELECT P.id
, (IFNULL(personel.prop, 0) +
IFNULL(instrument.prop, 0)
)/2*100 Rank
, personel.prop AS personell
, instrument.prop AS instrument
FROM Project_tbl P
LEFT JOIN
( SELECT B.Project_id pid, COUNT(*)/C.ref prop
FROM personel_project_tbl A,
personel_project_tbl B,
(SELECT COUNT(*) AS ref
FROM personel_project_tbl
WHERE Project_id = @Aid
) AS C
WHERE A.user_id = B.user_id
AND A.Project_id = @Aid
GROUP BY B.Project_id
) personel ON P.id = personel.pid
LEFT JOIN
( SELECT B.Project_id pid, COUNT(*)/C.ref prop
FROM instrument_project_tbl A,
instrument_project_tbl B,
(SELECT COUNT(*) AS ref
FROM instrument_project_tbl
WHERE Project_id = @Aid
) AS C
WHERE A.instrument_id = B.instrument_id
AND A.Project_id = @Aid
GROUP BY B.Project_id
) instrument ON P.id = instrument.pid
WHERE P.id <> @Aid
ORDER BY Rank DESC
想法是每个表都有一个子查询,并且这些子查询中的每一个将项目ID映射到给定表的对应率.
The idea is to have one subquery for each table, and each of these subqueries maps project id to correspondence ratio for a given table.
关于性能,我什么也没说.您必须尝试看看它是否足够快以满足您的需求,但是正如我所见,它无法克服您的O( n 2 )复杂性提及,因为您必须检查所有数据.
I'm saying nothing at all about performance. You'll have to try and see whether it is fast enough for your needs, but as I see it there is no way to beat the O(n2) complexity you mention, as you have to inspect all the data.
这篇关于提前查询.在mysql中对最相关的字段进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!