Apache Spark Mllib中的ALS机器学习算法中的等级是多少 [英] What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

查看:81
本文介绍了Apache Spark Mllib中的ALS机器学习算法中的等级是多少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想尝试一个ALS机器学习算法的示例.而且我的代码工作正常,但是我不理解算法中使用的参数rank.

I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm.

我在Java中有以下代码

I have following code in java

    // Build the recommendation model using ALS
    int rank = 10;
    int numIterations = 10;
    MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings),
            rank, numIterations, 0.01);

我已经读过一些模型中潜在因素的数量.

I have read some where that it is the number of latent factors in the model.

假设我有一个(user,product,rating)的数据集,其中有100行. rank(潜在因素)应该是什么值.

Suppose I have a dataset of (user,product,rating) that has 100 rows. What value should be of rank (latent factors).

推荐答案

正如您所说,排名是指假定的潜在或隐藏因素.例如,如果您要测量有多少不同的人喜欢电影并试图对其进行交叉预测,那么您可能具有三个字段:人物,电影,星数.现在,让我们说您无所不知,并且您知道绝对的事实,并且您知道实际上所有电影的收视率都可以通过性别,年龄和收入这三个隐藏因素来完美预测.在这种情况下,您的跑步成绩排名"应该为3.

As you said the rank refers the presumed latent or hidden factors. For example, if you were measuring how much different people liked movies and tried to cross-predict them then you might have three fields: person, movie, number of stars. Now, lets say that you were omniscient and you knew the absolute truth and you knew that in fact all the movie ratings could be perfectly predicted by just 3 hidden factors, sex, age and income. In that case the "rank" of your run should be 3.

当然,您不知道有多少潜在因素(如果有的话)驱动数据,因此您必须进行猜测.您使用的次数越多,得到的结果越好,但是您将需要更多的内存和计算时间.

Of course, you don't know how many underlying factors, if any, drive your data so you have to guess. The more you use, the better the results up to a point, but the more memory and computation time you will need.

一种工作方法是从5-10开始,然后逐渐提高,例如每次5,直到结果不再改善.这样,您可以通过实验确定最佳数据集排名.

One way to work it is to start with a rank of 5-10, then increase it, say 5 at a time until your results stop improving. That way you determine the best rank for your dataset by experimentation.

这篇关于Apache Spark Mllib中的ALS机器学习算法中的等级是多少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆