比较两个结果集之间的相似性 [英] Compare similarities between two result sets

查看:169
本文介绍了比较两个结果集之间的相似性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个音乐网站,我希望用户能够找到喜欢与他们大致相同的艺术家的用户。

I am creating a music website where I would like users to be able to find users who like approximately the same artists as they.

我有一个喜欢表有两列id_user,id_artist。
下面是一个我希望如何工作的例子:

I have a 'like' table that has 2 columns 'id_user', 'id_artist'. Here is an example of how I would like it to work:

User 1 likes:
1, 12
1, 13
1, 14
1, 26
1, 42
1, 44

User 2 likes:
2, 13
2, 14
2, 15
2, 26
2, 42
2, 56

这2个用户共有4个艺术家。
有一种方法,比较这两个结果集,找到数据库中最相似的人吗?

Those 2 users have 4 artists in common. Is there a way, to compare those 2 results sets, to find the most similar people in the database?

我的第一个想法是连接方式:12,13,14,26,42,44在字符串中,并使用mysql FULLTEXT分数来比较不同的字符串。
这没有工作...不知道为什么,但mysql fulltext只工作与文本...不与数字...

My first idea was to concatenate likes in that way: "12,13,14,26,42,44" in a string, and use mysql FULLTEXT scores to compare different strings. That didn't work... don't know why but mysql fulltext only works with text... not with numbers...

任何想法

推荐答案

这样:

SELECT first_user.id_user, second_user.id_user, COUNT(first_user.id_user) AS total_matches

FROM likes AS first_user

JOIN likes AS second_user
ON second_user.id_artist = first_user.id_artist
AND second_user.id_user != first_user.id_user

GROUP BY first_user.id_user, second_user.id_user

ORDER BY total_matches DESC

LIMIT 1

请注意, t非常有效。解决这个问题的一种方法是使包含此查询的输出的缓存表中除去 LIMIT 1 部分。添加一些相关索引,并查询此缓存表。您可以设置cron作业以定期更新此表。

Note that this isn't very efficient. One way to work around this is to make a 'cache table' containing the output of this query with the LIMIT 1 portion removed. Add some relevant indexes and do query this cache table. You could set a cron job to update this table periodically.

示例:

CREATE TABLE IF NOT EXISTS `likes` (
  `id_user` varchar(50) DEFAULT NULL,
  `id_artist` varchar(50) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `likes` (`id_user`, `id_artist`) VALUES ('8', '39'), ('8', '37'), ('4', '37'), ('8', '24'), ('8', '7'), ('4', '28'), ('8', '28'), ('4', '27'), ('4', '11'), ('8', '49'), ('4', '7'), ('4', '40'), ('4', '29'), ('8', '22'), ('4', '29'), ('8', '11'), ('8', '28'), ('4', '7'), ('4', '31'), ('8', '42'), ('8', '25'), ('4', '25'), ('4', '17'), ('4', '32'), ('4', '46'), ('4', '19'), ('8', '34'), ('3', '32'), ('4', '21')

+---------+---------+---------------+
| id_user | id_user | total_matches |
+---------+---------+---------------+
| 8       | 4       |             7 |
+---------+---------+---------------+

这篇关于比较两个结果集之间的相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆