MLLib spark -ALStrainImplicit 值大于 1 [英] MLLib spark -ALStrainImplicit value more than 1

查看:32
本文介绍了MLLib spark -ALStrainImplicit 值大于 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Spark mllib ALS("trainImplicit") 进行了一段时间的试验.想了解一下

Experimenting with Spark mllib ALS("trainImplicit") for a while now. Would like to understand

1.为什么我在预测中得到的评分值大于 1?

1.why Im getting ratings value more than 1 in the predictions?

2.是否需要规范化用户产品输入?

2.Is there any need for normalizing the user-product input?

示例结果:

[评分(用户=316017,产品=114019,评分=3.1923),

[Rating(user=316017, product=114019, rating=3.1923),

评分(用户=316017,产品=41930,评分=2.0146997092620897)]

Rating(user=316017, product=41930, rating=2.0146997092620897) ]

在文档中,提到预测的评级值将在 0-1 左右.我知道评分值仍可用于推荐,但如果我知道原因就太好了.

In the documentation, it is mentioned that the predicted rating values will be somewhere around 0-1. I know that the ratings values can still be used in recommendations but it would be great if I know the reason.

推荐答案

ALS trainImplicit() 中的成本函数不对预测评分值强加任何条件,因为它采用了与 0/1 之间的差异幅度.因此,您可能还会在那里找到一些负值.这就是为什么它说预测值在 [0,1] 左右,但不一定在其中.

The cost function in ALS trainImplicit() doesn't impose any condition on predicted rating values as it takes the magnitude of difference from 0/1. So, you may also find some negative values there. That is why it says the predicted values are around [0,1] not necessarily in it.

有一个选项可以仅设置非负因式分解,因此您永远不会在预测评分或特征矩阵中得到负值,但这似乎会降低我们案例的性能.

There is one option to set non-negative factorization only, so that you never get a negative value in predicted rating or feature matrices, but that seemed to drop the performance for our case.

这篇关于MLLib spark -ALStrainImplicit 值大于 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆