MLLib spark -ALStrainImplicit值大于1 [英] MLLib spark -ALStrainImplicit value more than 1
问题描述
现在使用Spark mllib ALS("trainImplicit")进行实验. 想了解
Experimenting with Spark mllib ALS("trainImplicit") for a while now. Would like to understand
1.为什么我在预测中获得的收视率值超过1?
1.why Im getting ratings value more than 1 in the predictions?
2.是否需要规范用户产品输入?
2.Is there any need for normalizing the user-product input?
样本结果:
[评分(用户= 316017,产品= 114019,评分= 3.1923),
[Rating(user=316017, product=114019, rating=3.1923),
评分(用户= 316017,产品= 41930,评分= 2.0146997092620897) ]
Rating(user=316017, product=41930, rating=2.0146997092620897) ]
在文档中,提到了预测的额定值将在0-1左右. 我知道评级值仍然可以用于建议中,但是如果我知道原因,那就太好了.
In the documentation, it is mentioned that the predicted rating values will be somewhere around 0-1. I know that the ratings values can still be used in recommendations but it would be great if I know the reason.
推荐答案
ALS trainImplicit()中的成本函数不对预测的额定值施加任何条件,因为它的取值范围为0/1.因此,您可能还会在其中找到一些负值.这就是为什么它说预测值不一定在[0,1]左右的原因.
The cost function in ALS trainImplicit() doesn't impose any condition on predicted rating values as it takes the magnitude of difference from 0/1. So, you may also find some negative values there. That is why it says the predicted values are around [0,1] not necessarily in it.
只有一种设置非负因式分解的选项,这样您就永远不会在预测的评级或特征矩阵中得到负值,但这似乎会降低我们的案例的性能.
There is one option to set non-negative factorization only, so that you never get a negative value in predicted rating or feature matrices, but that seemed to drop the performance for our case.
这篇关于MLLib spark -ALStrainImplicit值大于1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!