MySQL查询以查找最相似的数值行 [英] MySQL Query to find most similar numerical row
问题描述
在MySQL数据库中,我试图跨多个数值属性找到最相似的行.此问题类似于此问题,但其中包含一个比较灵活的数量和联接表.
In a MySQL database, I am attempting to find the most similar row across a number of numerical attributes. This problem is similar to this question but includes a flexible number of comparisons and a join table.
数据库由两个表组成.第一张表,用户,是我要比较的.
The database consists of two tables. The first table, users, is what I'm trying to compare.
id | self_ranking
----------------------------------
1 | 9
2 | 3
3 | 2
第二张表是用户对特定项目给予的一系列评分.
The second table is a series of scores which the user gave to particular items.
id | user_id | item_id | score
----------------------------------
1 | 1 | 1 | 4
2 | 1 | 2 | 5
3 | 1 | 3 | 8
4 | 1 | 4 | 3
任务
我想找到与给定用户最相似"的用户,对所有排名项目进行均等的评估(以及自我得分).因此,完美的匹配将是以完全相同的方式对所有相同项目进行排名的用户.对自己的评分相同,而下一个最佳选择是一项的排名略有不同.
Task
I want to find the "most similar" user to a given one, valuing all the ranked items equally (along with the self score). Thus, a perfect match would be the user who has ranked all the same items in exactly the same manner & has rated himself the same, while the next most optimal choice would be one whose ranking of one item differs slightly.
我遇到了困难:
- 高效地联接两个表
- 处理并非所有用户都对相同项目进行排名的事实.我们只想比较相同项目的排名.
有人可以帮助我构建一个合理的查询吗?我对MySQL的了解不是很强,所以很抱歉,如果这个答案很明显.
Could someone help me construct a reasonable query? I'm not terribly strong with MySQL, so sorry if this answer should be obvious.
如果用户4对自己的排名为8,而项目1 => 4和2 => 5,那么我希望查询用户4的最接近用户返回1,即最接近的用户的user_id.
If user 4 has ranked himself 8 and items 1=>4 and 2=>5, then I'd like to have the query for user 4's closest user to return 1, the user_id of the closest user.
推荐答案
SELECT u2.user_id
-- join our user to their scores
FROM (users u1 JOIN scores s1 USING (user_id))
-- and then join other users and their scores
JOIN (users u2 JOIN scores s2 USING (user_id))
ON s1.item_id = s2.item_id
AND u1.user_id != u2.user_id
-- filter for our user of interest
WHERE u1.user_id = ?
-- group other users' scores together
GROUP BY u2.user_id
-- and here's the magic: order in descending order of "distance" between
-- our selected user and all of the others: you may wish to weight
-- self_ranking differently to item scores, in which case just multiply
-- appropriately
ORDER BY SUM(ABS(s2.score - s1.score))
+ ABS(u2.self_ranking - u1.self_ranking) DESC
这篇关于MySQL查询以查找最相似的数值行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!