为什么spark-ml ALS模型返回NaN和负数预测? [英] Why does spark-ml ALS model returns NaN and negative numbers predictions?

查看:326
本文介绍了为什么spark-ml ALS模型返回NaN和负数预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,我正在尝试使用spark-ml中的ALS具有隐式评级.

Actually I'm trying to use ALS from spark-ml with implicit ratings.

我注意到我训练有素的模型给出的一些预测是negativeNaN,为什么?

I noticed that some predictions given by my trained model are negative or NaN, why is it?

推荐答案

Apache Spark提供了在ALS上强制非负面约束的选项.

Apache Spark provides an option to force non negative constraints on ALS.

因此,要删除这些负值,只需设置:

Thus, to remove these negative values, you'll just need to set :

Python:

nonnegative=True

scala:

setNonnegative(true)

在创建您的ALS模型时,即:

when creating your ALS model, i.e :

>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)

非负矩阵分解(NMF或NNMF),也称为非负矩阵近似,是多变量分析和线性代数中的一组算法,其中矩阵 V 通常被分解为两个矩阵 W H ,具有这三个矩阵都具有非负元素的属性 [Ref. 维基百科].

Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].

如果您想了解有关NMF的更多信息,建议阅读以下文章:

If you want to read more about NMF , I'd recommend reading the following paper :

对于NaN值,通常是由于拆分了数据集,如果其中一个不存在于训练集中而仅出现在测试集中,则可能导致看不见的项目或用户.如果您对培训进行交叉验证,也可能会发生这种情况.就此而言,有两个JIRA标记为 resolved 对于2.2:

As for NaN values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :

  • https://issues.apache.org/jira/browse/SPARK-14489.
  • https://issues.apache.org/jira/browse/SPARK-19345.

最新版本将允许您设置创建模型时要使用的冷启动策略.

The latest will allow you set the cold start strategy to use when creating your model.

这篇关于为什么spark-ml ALS模型返回NaN和负数预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆