协同过滤:非个性化项目对项相似度 [英] Collaborative Filtering: Non-Personalized item-to-item similarity

查看:220
本文介绍了协同过滤:非个性化项目对项相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图来计算项目对项相似度沿亚马逊的线谁查看/购买X的客户也查看/购买Y和Z。所有的我见过的实施例和参考文献是为任一计算项目相似度为排项,查找用户 - 用户的相似性,或用于查找基于当前用户的历史推荐的项目。我想保理在当前用户的preferences之前,非针对性的方法来开始吧。

I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current users' history. I'd like to start off with a non-targeted approach before factoring in the current users' preferences.

纵观 Amazon.com建议白皮书 ,他们使用下面的逻辑进行离线项目项的相似性:

Looking at the Amazon.com recommendations white paper, they use the following logic for offline item-item similarity:

For each item in product catalog, I1 
  For each customer C who purchased I1
    For each item I2 purchased by customer C
       Record that a customer purchased I1 and I2
  For each item I2 
    Compute the similarity between I1 and I2

如果我理解正确的话的时候,我们正处于I1和I2的计算similiarty,我会同一个值I1(外环)购买的物品(I2)的列表。

If I understand correctly, by the time we're at "Compute similiarty between I1 and I2", I have a list of items(I2) purchased in conjunction with a single value I1(the outer loop).

这是如何计算进行?

另一个想法是,我这个得太多,使之更加困难比我需要 - 它是否足以做I2的计数前N个查询,购买与I1一起?

Another idea is that I'm overthinking this and making it more difficult than I need to - Would it be enough to do a top-n query on the count of I2 bought in conjunction with I1?

我也AP $这种方法是否是正确的一个p $ pciate建议。我的产品数据库拥有在任何时间约15万项。由于大部分我见过的阅读材料中显示用户的项目相似,甚至用户 - 用户的相似性,我应该找走这条路吧。

I also appreciate suggestions on whether or not this approach is a correct one. My product database has about 150k items at any time. Since the bulk of the reading material I've seen shows user-item similarity or even user-user similarity, should I be looking to go that route instead.

我已经与过去相似的算法工作,但他们总是涉及等级或分数。我认为这会工作将是建立一个以客户为产品矩阵打进0/1不购买的唯一途径/购买。由于购买历史记录,而该项目的大小,这有可能会真的很大。

I've worked with similarity algorithms in the past but they've always involved a rank or a score. I think the only way this would work would be to build a customer-product matrix scoring 0/1 for not purchased/purchased. Given the purchase history and the item size, this could get really large.

编辑:尽管我列出Python作为一个标签,我想preFER保持逻辑分贝,$ P $内pferably使用Oracle PL / SQL

edit: although i listed python as a tag, i'd prefer to keep the logic inside of a db, preferably using Oracle PL/SQL.

推荐答案

有一个很好的 O'Reilly的书籍关于这个话题。虽然白皮书可能奠定逻辑在伪code这样的,我不认为这种做法会很好地进行缩放。该计算是所有的概率计算,这样的事情如贝叶斯定理习惯说特定的人购买的X,什么是他们购买Z中的可能性有多大?直白遍历数据工作太辛苦。你必须要经历这一切的每个人。

There's a good O'Reilly book on this topic. While the whitepaper might lay the logic out in pseudo-code like that, I don't think that approach would scale very well. The calculations are all probability calculations, so things like Bayes' Theorem get used to say, "Given Person A purchased X, what's the likelihood they purchased Z?" Straightforward looping over the data is working too hard. You have to go through it all for each person.

这篇关于协同过滤:非个性化项目对项相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆