如何为ALS更新Spark MatrixFactorizationModel [英] How to update Spark MatrixFactorizationModel for ALS

查看:108
本文介绍了如何为ALS更新Spark MatrixFactorizationModel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我受 https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html .

我在进行像这样的显式训练时也遇到了问题: Apache Spark ALS协作过滤结果.他们没有道理 对隐式数据和隐式数据进行隐式训练都可以得到合理的结果,但是显式训练却不能.

虽然现在对我来说还可以,但对如何更新模型感到好奇.虽然我目前的解决方案像

  1. 具有所有用户评分
  2. 生成模型
  3. 获取用户推荐

我想要这样的流程:

  1. 具有评级基础
  2. 一次生成模型(可选保存并加载)
  3. 获得一名用户对10部随机电影的评价(不在模型中!)
  4. 使用模型和新的用户评分获取推荐

因此,我必须更新我的模型,而不必完全重新计算它.有机会这样做吗?

虽然第一种方法对批处理很有用(例如在夜间批处理中生成建议),但第二种方法对近乎实时的建议生成也很有用.

解决方案

以下内容对我有用,因为我具有隐式的反馈评分,并且仅在为新用户排列产品等级时才感兴趣. 更多详细信息此处


您实际上可以使用经过训练的模型(无需更新)来获得新用户的预测:

要获得模型中用户的预测,请使用其潜在表示(大小为f(因子数)的向量u)乘以乘积潜在因子矩阵(由所有乘积的潜在表示组成的矩阵) ,一堆大小为f)的向量,并为您提供每个产品的得分.对于新用户,问题在于您无法访问其潜在表示(您只能使用大小为M(不同产品的数量)的完整表示,但是您可以使用相似性函数来计算相似的潜在将新用户乘以乘积矩阵的转置来表示该新用户.

即如果您的用户潜在矩阵是u而您的产品潜在矩阵是v,那么对于模型中的用户i,您可以通过执行以下操作来获得得分:u_i * v 对于新用户,您没有潜在的表示形式,因此请使用完整表示形式full_u并执行:full_u * v ^ t * v 这将近似于新用户的潜在因素,并应给出合理的建议(如果模型已经为现有用户提供了合理的建议)

要回答培训问题,这使您可以计算新用户的预测,而不必进行繁重的模型计算,而现在您只能偶尔进行一次计算.因此,您可以在晚上进行批处理,并且仍然可以在白天为新用户做出预测.

注意:MLLIB使您可以访问矩阵u和v

I build a simple recommendation system for the MovieLens DB inspired by https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html.

I also have problems with explicit training like here: Apache Spark ALS collaborative filtering results. They don't make sense Using implicit training (on both explicit and implicit data) gives me reasonable results, but explicit training doesn't.

While this is ok for me by now, im curious on how to update a model. While my current solution works like

  1. having all user ratings
  2. generate model
  3. get recommendations for user

I want to have a flow like this:

  1. having a base of ratings
  2. generate model once (optional save & load it)
  3. get some ratings by one user on 10 random movies (not in the model!)
  4. get recommendations using the model and the new user ratings

Therefore I must update my model, without completely recompute it. Is there any chance to do so?

While the first way is good for batch processing (like generating recommendations in nightly batches) the second way would be good for nearly-live generating of recommendations.

解决方案

Edit: the following worked for me because I had implicit feedback ratings and was only interesting in ranking the products for a new user. More details here


You can actually get predictions for new users using the trained model (without updating it):

To get predictions for a user in the model, you use its latent representation (vector u of size f (number of factors)), which is multiplied by the product latent factor matrix (matrix made of the latent representations of all products, a bunch of vectors of size f) and gives you a score for each product. For new users, the problem is that you don't have access to their latent representation (you only have the full representation of size M (number of different products), but what you can do is use a similarity function to compute a similar latent representation for this new user by multiplying it by the transpose of the product matrix.

i.e. if you user latent matrix is u and your product latent matrix is v, for user i in the model, you get scores by doing: u_i * v for a new user, you don't have a latent representation, so take the full representation full_u and do: full_u * v^t * v This will approximate the latent factors for the new users and should give reasonable recommendations (if the model already gives reasonable recommendations for existing users)

To answer the question of training, this allows you to compute predictions for new users without having to do the heavy computation of the model which you can now do only once in a while. So you have you batch processing at night and can still make prediction for new user during the day.

Note: MLLIB gives you access to the matrix u and v

这篇关于如何为ALS更新Spark MatrixFactorizationModel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆