MySQL中的协作过滤? [英] Collaborative filtering in MySQL?

查看:88
本文介绍了MySQL中的协作过滤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一个网站,该网站根据用户的偏好向他们推荐商品(包括书籍).到目前为止,我已经阅读了O'Reilly的集体情报"以及许多其他在线文章.但是,它们似乎都涉及单个推荐实例,例如,如果您喜欢本书A,那么您可能会喜欢本书B.

I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B.

我想做的是为网站上的每个用户创建一组首选项节点".假设某个用户喜欢书籍A,B和C.然后,当他们添加书籍D时,我不希望系统推荐仅基于其他用户对书籍D的体验的其他书籍.我不希望该系统查找类似的内容首选项节点"并根据此推荐书籍.

What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that.

以下是4个节点的示例:

Here's an example of 4 nodes:

User1: 'book A'->'book B'->'book C'
User2: 'book A'->'book B'->'book C'->'book D'
user3: 'book X'->'book Y'->'book C'->'book Z'
user4: 'book W'->'book Q'->'book C'->'book Z'

因此,正如我所阅读的材料中所述,推荐系统会将本书Z推荐给用户1,因为有两个人同时推荐Z和喜欢C(即Z的重量大于D的重量),甚至尽管具有类似首选项节点"的用户User2将更有资格推荐图书D,因为他的兴趣模式更为相似.

So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern.

那么你们中的任何人都对这种事情有经验吗?有什么我应该尝试阅读的东西吗?或者是否存在任何开放源代码系统?

So do any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this?

感谢您的宝贵时间!

我认为last.fm的算法完全可以完成我的系统工作.使用人们的偏好树将音乐更个性化地推荐给人们.不仅仅是说您可能喜欢B,因为您喜欢A"

Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"

推荐答案

创建表并插入测试数据:

Create a table and insert the test data:

CREATE TABLE `ub` (
  `user_id` int(11) NOT NULL,
  `book_id` varchar(10) NOT NULL,
  PRIMARY KEY (`user_id`,`book_id`),
  UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into ub values (1, 'A'), (1, 'B'), (1, 'C');
insert into ub values (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');
insert into ub values (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');
insert into ub values (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');

通过book_id将测试数据结合到自身上,并创建一个临时表来保存每个user_id以及与目标user_id共有的书本数量:

Join the test data onto itself by book_id, and create a temporary table to hold each user_id and the number of books it has in common with the target user_id:

create temporary table ub_rank as 
select similar.user_id,count(*) rank
from ub target 
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id
where target.user_id = 1
group by similar.user_id;

select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
|       2 |    3 |
|       3 |    1 |
|       4 |    1 |
+---------+------+
3 rows in set (0.00 sec)

我们可以看到user_id与user_id 1有3个共同点,但是user_id 3和user_id 4分别只有1个.

We can see that user_id has 3 in common with user_id 1, but user_id 3 and user_id 4 only have 1 each.

接下来,选择临时表中用户拥有的与目标user_id的书不匹配的所有书,然后按等级排列.请注意,同一本书可能会出现在不同的用户列表中,因此我们对每本书的排名进行汇总,以便普通书籍获得更高的排名.

Next, select all the books that the users in the temporary table have that do not match the target user_id's books, and arrange these by rank. Note that the same book might appear in different user's lists, so we sum the rankings for each book so that common books get a higher ranking.

select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id
where target.book_id is null
group by similar.book_id
order by total_rank desc;

+---------+------------+
| book_id | total_rank |
+---------+------------+
| D       |          3 |
| Z       |          2 |
| X       |          1 |
| Y       |          1 |
| Q       |          1 |
| W       |          1 |
+---------+------------+
6 rows in set (0.00 sec)

Book Z出现在两个用户列表中,因此排在X,Y,Q,W之上,而X,Y,Q,W仅出现在一个用户列表中.图书D表现最好,因为它出现在user_id 2的列表中,该列表与目标user_id 1有3个共同点.

Book Z appeared in two user lists, and so was ranked above X,Y,Q,W which only appeared in one user's list. Book D did best because it appeared in user_id 2's list, which had 3 items in common with target user_id 1.

这篇关于MySQL中的协作过滤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆