ALS模型-如何生成full_u * v ^ t * v? [英] ALS model - how to generate full_u * v^t * v?

查看:70
本文介绍了ALS模型-如何生成full_u * v ^ t * v?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试弄清ALS模型如何在批处理过程更新之间预测新用户的值.在搜索中,我遇到了这个 stackoverflow答案.为了方便读者,我复制了以下答案:

I'm trying to figure out how an ALS model can predict values for new users in between it being updated by a batch process. In my search, I came across this stackoverflow answer. I've copied the answer below for the reader's convenience:

您可以使用经过训练的模型(无需更新)来获得新用户的预测:

You can get predictions for new users using the trained model (without updating it):

要获得模型中用户的预测,请使用其潜在表示(大小为f(因子数)的向量u)乘以乘积潜在因子矩阵(由所有乘积的潜在表示组成的矩阵) ,一堆大小为f)的向量,并为您提供每个产品的得分.对于新用户,问题在于您无法访问其潜在表示(您只能使用大小为M(不同产品的数量)的完整表示,但是您可以使用相似性函数来计算相似的潜在将新用户乘以乘积矩阵的转置来表示该新用户.

To get predictions for a user in the model, you use its latent representation (vector u of size f (number of factors)), which is multiplied by the product latent factor matrix (matrix made of the latent representations of all products, a bunch of vectors of size f) and gives you a score for each product. For new users, the problem is that you don't have access to their latent representation (you only have the full representation of size M (number of different products), but what you can do is use a similarity function to compute a similar latent representation for this new user by multiplying it by the transpose of the product matrix.

即如果您的用户潜在矩阵为u而您的产品潜在矩阵为v,那么对于模型中的用户i,您可以通过以下方式获得得分:u_i * v对于新用户,您没有潜在表示,因此请采用完整表示full_u并执行:full_u * v ^ t * v这将近似于新用户的潜在因素并应给出合理的建议(如果模型已经为现有用户提供了合理的建议)

i.e. if you user latent matrix is u and your product latent matrix is v, for user i in the model, you get scores by doing: u_i * v for a new user, you don't have a latent representation, so take the full representation full_u and do: full_u * v^t * v This will approximate the latent factors for the new users and should give reasonable recommendations (if the model already gives reasonable recommendations for existing users)

要回答培训问题,这使您可以为新用户计算预测,而不必进行繁重的模型计算,而现在您只能偶尔进行一次计算.因此,您可以在晚上进行批处理,并且仍然可以在白天为新用户做出预测.

To answer the question of training, this allows you to compute predictions for new users without having to do the heavy computation of the model which you can now do only once in a while. So you have you batch processing at night and can still make prediction for new user during the day.

注意:MLLIB使您可以访问矩阵u和v

Note: MLLIB gives you access to the matrix u and v

上面引用的文本是一个很好的答案,但是,我正在努力了解如何以编程方式实现此解决方案.例如,矩阵u和v可以通过以下方式获得:

The quoted text above is an excellent answer, however, I'm struggling to understand how to programmatically implement this solution. For example, the matrix u and v can be obtained with:

# pyspark example

# ommitted for brevity ... loading movielens 1M ratings

model = ALS.train(ratings, rank, numIterations, lambdaParam)

matrix_u = model.userFeatures()

print(matrix_u.take(2)) # take a look at the dataset

这将返回:

[
  (2, array('d', [0.26341307163238525, 0.1650490164756775, 0.118405282497406, -0.5976635217666626, -0.3913084864616394, -0.1379186064004898, -0.3866392970085144, -0.1768060326576233, -0.38342711329460144, 0.48550787568092346, -0.18867433071136475, -0.02757863700389862, 0.1410026103258133, 0.11498363316059113, 0.03958914801478386, 0.034536730498075485, 0.08427099883556366, 0.46969038248062134, -0.8230801224708557, -0.15124185383319855, 0.2566414773464203, 0.04326820373535156, 0.19077207148075104, 0.025207923725247383, -0.02030213735997677, 0.1696728765964508, 0.5714617967605591, -0.03885050490498543, -0.09797532111406326, 0.29186877608299255, -0.12768596410751343, -0.1582849770784378, 0.01933656632900238, -0.09131495654582977, 0.26577943563461304, -0.4543033838272095, -0.11789630353450775, 0.05775507912039757, 0.2891307771205902, -0.2147761881351471, -0.011787488125264645, 0.49508437514305115, 0.5610293745994568, 0.228189617395401, 0.624510645866394, -0.009683617390692234, -0.050237834453582764, -0.07940001785755157, 0.4686132073402405, -0.02288617007434368])), 
  (4, array('d', [-0.001666820957325399, -0.12487432360649109, 0.1252429485321045, -0.794727087020874, -0.3804478347301483, -0.04577340930700302, -0.42346617579460144, -0.27448347210884094, -0.25846347212791443, 0.5107921957969666, 0.04229479655623436, -0.10212298482656479, -0.13407345116138458, -0.2059325873851776, 0.12777331471443176, -0.318756639957428, 0.129398375749588, 0.4351944327354431, -0.9031049013137817, -0.29211774468421936, -0.02933369390666485, 0.023618215695023537, 0.10542935132980347, -0.22032295167446136, -0.1861676126718521, 0.13154461979866028, 0.6130356192588806, -0.10089754313230515, 0.13624103367328644, 0.22037173807621002, -0.2966669499874115, -0.34058427810668945, 0.37738317251205444, -0.3755438029766083, -0.2408779263496399, -0.35355791449546814, 0.05752146989107132, -0.15478627383708954, 0.3418906629085541, -0.6939512491226196, 0.4279302656650543, 0.4875738322734833, 0.5659542083740234, 0.1479463279247284, 0.5280753970146179, -0.24357643723487854, 0.14329688251018524, -0.2137598991394043, 0.011986476369202137, -0.015219110995531082]))
]

我也可以做类似的操作来获得v矩阵:

I can also do similar to get the v matrix:

matrix_v = model.productFeatures()

print(matrix_v.take(2)) # take a look at the dataset

结果是:

[
  (2, array('d', [0.019985994324088097, 0.0673416256904602, -0.05697149783372879, -0.5434763431549072, -0.40705952048301697, -0.18632276356220245, -0.30776089429855347, -0.13178342580795288, -0.27466219663619995, 0.4183739423751831, -0.24422742426395416, -0.24130797386169434, 0.24116989970207214, 0.06833088397979736, -0.01750543899834156, 0.03404173627495766, 0.04333991929888725, 0.3577033281326294, -0.7044714689254761, 0.1438472419977188, 0.06652364134788513, -0.029888223856687546, -0.16717877984046936, 0.1027146726846695, -0.12836599349975586, 0.10197233408689499, 0.5053384900093079, 0.019304445013403893, -0.21254844963550568, 0.2705852687358856, -0.04169371724128723, -0.24098040163516998, -0.0683765709400177, -0.09532768279314041, 0.1006036177277565, -0.08682398498058319, -0.13584329187870026, -0.001340558985248208, 0.20587041974067688, -0.14007550477981567, -0.1831497997045517, 0.5021498203277588, 0.3049483597278595, 0.11236990243196487, 0.15783801674842834, -0.044139936566352844, -0.14372406899929047, 0.058535050600767136, 0.3777201473712921, -0.045475270599126816])), 
  (4, array('d', [0.10334215313196182, 0.1881643384695053, 0.09297363460063934, -0.457258403301239, -0.5272660255432129, -0.0989445373415947, -0.2053477019071579, -0.1644461452960968, -0.3771175146102905, 0.21405018866062164, -0.18553146719932556, 0.011830524541437626, 0.29562288522720337, 0.07959598302841187, -0.035378433763980865, -0.11786794662475586, -0.11603366583585739, 0.3776192367076874, -0.5124108791351318, 0.03971947357058525, -0.03365595266222954, 0.023278912529349327, 0.17436474561691284, -0.06317273527383804, 0.05118614062666893, 0.4375131130218506, 0.3281322419643402, 0.036590900272130966, -0.3759073317050934, 0.22429685294628143, -0.0728025734424591, -0.10945595055818558, 0.0728464275598526, 0.014129920862615108, -0.10701996833086014, -0.2496117204427719, -0.09409723430871964, -0.11898282915353775, 0.18940524756908417, -0.3211393356323242, -0.035668935626745224, 0.41765937209129333, 0.2636736035346985, -0.01290816068649292, 0.2824321389198303, 0.021533429622650146, -0.08053319901227951, 0.11117415875196457, 0.22975310683250427, 0.06993964314460754]))
]

但是,我不确定如何从此升级到full_u * v^t * v

However, I'm not sure how to progress from this to full_u * v^t * v

推荐答案

此新用户不是矩阵U,因此您没有在"k"因子中的潜在表示,您只知道其完整表示,即,所有评分.这里的full_u表示以密集格式(不是稀疏格式ratings)的所有新用户评分,例如:

This new user is not the the matrix U, so you don't have its latent representation in 'k' factors, you only know its full representation, i.e., all its ratings. full_u here means all of the new user ratings in a dense format (not the sparse format ratings are) e.g.:

[0 2 0 0 0 1 0]如果用户u对项目2的评分为2,对项目6的评分为1.

[0 2 0 0 0 1 0] if user u has rated item 2 with a 2 and item 6 with a 1.

然后您可以像获得的一样获得v并将其转换为numpy中的矩阵:

then you can get v pretty much like you did and turning it to a matrix in numpy for instance:

pf = model.productFeatures()
Vt = np.matrix(np.asarray(pf.values().collect()))

然后只需要乘以即可: full_u*Vt*Vt.T

then is is just a matter of multiplying: full_u*Vt*Vt.T

VtV换位了,但这只是一个约定.

Vt and V are transposed compared to the other answer but that's just a matter of convention.

请注意,Vt*Vt.T产品是固定的,因此,如果您打算将其用于多个新用户,则在计算上将更加高效.实际上,对于多个用户来说,最好将其所有评分都放入bigU(与我的一个新用户示例相同的格式)中,然后做矩阵乘积: bigU*Vt*Vt.T获取所有新用户的所有评分.就操作次数而言,仍然值得检查产品是否以最有效的方式完成.

Note that the Vt*Vt.T product is fixed, so if you are going to use this for multiple new users it will be computationally more efficient to pre-compute it. Actually for more than one user it would be better to put all their ratings in bigU (in the same format as my one new user example) and just do the matrix product: bigU*Vt*Vt.T to get all the ratings for all the new users. Might still be worth checking that the product is done in the most efficient way in terms of number of operations.

这篇关于ALS模型-如何生成full_u * v ^ t * v?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆