如何在Apache Spark中评估隐式反馈ALS算法的建议? [英] How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

查看:320
本文介绍了如何在Apache Spark中评估隐式反馈ALS算法的建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于隐式评级" 可以从零到任意变化,您如何评估Apache Spark的隐式反馈协作过滤算法,因此,简单的MSE或RMSE没有什么意义?

解决方案

要回答此问题,您需要返回原始文档,该论文定义了隐式反馈和ALS算法 https://github.com/apache/spark/pull/16618 关于为spark-ml添加RankingEvaluator.

尽管如此,实现并不复杂.如果您有兴趣,可以参考代码此处尽早获得它.

我希望这能回答您的问题.

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?

解决方案

To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.

What is implicit feedback ?

In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.

Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.

Do same evaluating techniques apply here? Such as RMSE, MSE.

It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.

Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.

However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.

So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.

Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.

Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.

For more information about this metric, I advise you to read the paper stated above.

Ok, now we know what we are going to use but what about Apache Spark?

Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator for spark-ml.

The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.

I hope this answers your question.

这篇关于如何在Apache Spark中评估隐式反馈ALS算法的建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆