如何为 ALS 更新 Spark MatrixFactorizationModel [英] How to update Spark MatrixFactorizationModel for ALS

查看:28
本文介绍了如何为 ALS 更新 Spark MatrixFactorizationModel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html.

我也有像这里这样的显式训练的问题:Apache Spark ALS 协同过滤结果.他们没有意义使用隐式训练(在显式和隐式数据上)给了我合理的结果,但显式训练没有.

虽然现在这对我来说没问题,但我很好奇如何更新模型.虽然我目前的解决方案像

  1. 拥有所有用户评分
  2. 生成模型
  3. 为用户获取推荐

我想要这样的流程:

  1. 有评分基础
  2. 生成模型一次(可选保存和加载)
  3. 获得一位用户对 10 部随机电影的评分(不在模型中!)
  4. 使用模型和新用户评分获得推荐

因此我必须更新我的模型,而不是完全重新计算它.有机会这样做吗?

虽然第一种方式适用于批处理(例如每晚生成推荐),但第二种方式适用于几乎实时生成推荐.

解决方案

以下对我有用,因为我有隐含的反馈评级,并且只对新用户的产品排名感兴趣.更多详情此处<小时>

您实际上可以使用经过训练的模型(无需更新)获得对新用户的预测:

为了获得模型中用户的预测,您使用其潜在表示(大小为 f 的向量 u(因子数)),乘以产品潜在因子矩阵(由所有产品的潜在表示组成的矩阵),一堆大小为 f) 的向量,并为您提供每个产品的分数.对于新用户,问题在于您无法访问他们的潜在表示(您只有大小 M(不同产品的数量)的完整表示,但您可以做的是使用相似度函数来计算相似的潜在通过乘以乘积矩阵的转置来表示这个新用户.

即如果您的用户潜在矩阵是 u 并且您的产品潜在矩阵是 v,对于模型中的用户 i,您可以通过执行以下操作获得分数: u_i * v对于新用户,您没有潜在表示,因此采用完整表示 full_u 并执行: full_u * v^t * v这将近似于新用户的潜在因素,并应该给出合理的推荐(如果模型已经为现有用户给出了合理的推荐)

为了回答训练问题,这允许您为新用户计算预测,而无需对模型进行繁重的计算,而您现在只能偶尔进行一次.因此,您可以在晚上进行批处理,而在白天仍然可以对新用户进行预测.

注意:MLLIB 允许您访问矩阵 u 和 v

I build a simple recommendation system for the MovieLens DB inspired by https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html.

I also have problems with explicit training like here: Apache Spark ALS collaborative filtering results. They don't make sense Using implicit training (on both explicit and implicit data) gives me reasonable results, but explicit training doesn't.

While this is ok for me by now, im curious on how to update a model. While my current solution works like

  1. having all user ratings
  2. generate model
  3. get recommendations for user

I want to have a flow like this:

  1. having a base of ratings
  2. generate model once (optional save & load it)
  3. get some ratings by one user on 10 random movies (not in the model!)
  4. get recommendations using the model and the new user ratings

Therefore I must update my model, without completely recompute it. Is there any chance to do so?

While the first way is good for batch processing (like generating recommendations in nightly batches) the second way would be good for nearly-live generating of recommendations.

解决方案

Edit: the following worked for me because I had implicit feedback ratings and was only interesting in ranking the products for a new user. More details here


You can actually get predictions for new users using the trained model (without updating it):

To get predictions for a user in the model, you use its latent representation (vector u of size f (number of factors)), which is multiplied by the product latent factor matrix (matrix made of the latent representations of all products, a bunch of vectors of size f) and gives you a score for each product. For new users, the problem is that you don't have access to their latent representation (you only have the full representation of size M (number of different products), but what you can do is use a similarity function to compute a similar latent representation for this new user by multiplying it by the transpose of the product matrix.

i.e. if you user latent matrix is u and your product latent matrix is v, for user i in the model, you get scores by doing: u_i * v for a new user, you don't have a latent representation, so take the full representation full_u and do: full_u * v^t * v This will approximate the latent factors for the new users and should give reasonable recommendations (if the model already gives reasonable recommendations for existing users)

To answer the question of training, this allows you to compute predictions for new users without having to do the heavy computation of the model which you can now do only once in a while. So you have you batch processing at night and can still make prediction for new user during the day.

Note: MLLIB gives you access to the matrix u and v

这篇关于如何为 ALS 更新 Spark MatrixFactorizationModel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆