Apache Spark Mllib 中 ALS 机器学习算法的排名 [英] What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

查看:30
本文介绍了Apache Spark Mllib 中 ALS 机器学习算法的排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想尝试一个 ALS 机器学习算法的例子.我的代码工作正常,但是我不明白算法中使用的参数 rank.

I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm.

我在java中有以下代码

I have following code in java

    // Build the recommendation model using ALS
    int rank = 10;
    int numIterations = 10;
    MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings),
            rank, numIterations, 0.01);

我读过一些关于模型中潜在因子数量的文章.

I have read some where that it is the number of latent factors in the model.

假设我有一个有 100 行的 (user,product,rating) 数据集.rank(潜在因素)的值应该是多少.

Suppose I have a dataset of (user,product,rating) that has 100 rows. What value should be of rank (latent factors).

推荐答案

正如您所说,等级是指假定的潜在或隐藏因素.例如,如果您要测量不同人喜欢电影的程度并尝试对其进行交叉预测,那么您可能需要三个字段:人物、电影、明星数量.现在,假设您无所不知,您知道绝对真理,并且您知道事实上所有电影收视率都可以通过性别、年龄和收入这 3 个隐藏因素来完美预测.在这种情况下,您跑步的排名"应该是 3.

As you said the rank refers the presumed latent or hidden factors. For example, if you were measuring how much different people liked movies and tried to cross-predict them then you might have three fields: person, movie, number of stars. Now, lets say that you were omniscient and you knew the absolute truth and you knew that in fact all the movie ratings could be perfectly predicted by just 3 hidden factors, sex, age and income. In that case the "rank" of your run should be 3.

当然,您不知道有多少潜在因素(如果有)驱动您的数据,因此您必须猜测.您使用的越多,在某一点上的结果就越好,但您需要的内存和计算时间也就越多.

Of course, you don't know how many underlying factors, if any, drive your data so you have to guess. The more you use, the better the results up to a point, but the more memory and computation time you will need.

一种工作方法是从 5-10 的排名开始,然后增加它,一次说 5,直到您的结果停止提高.这样您就可以通过实验确定数据集的最佳排名.

One way to work it is to start with a rank of 5-10, then increase it, say 5 at a time until your results stop improving. That way you determine the best rank for your dataset by experimentation.

这篇关于Apache Spark Mllib 中 ALS 机器学习算法的排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆