如何在 Apache Spark 中评估隐式反馈 ALS 算法的推荐? [英] How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

查看:47
本文介绍了如何在 Apache Spark 中评估隐式反馈 ALS 算法的推荐?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你如何评估 Apache Spark 的隐式反馈协同过滤算法,因为隐式评级"可以从零到任何变化,所以一个简单的 MSE 或RMSE 意义不大?

解决方案

要回答这个问题,您需要回到定义什么是隐式反馈和 ALS 算法的原始论文 隐式反馈数据集的协同过滤 作者:Yifan Hu、Yehuda Koren 和 Chris Volinsky.

什么是隐式反馈?

在没有显式评分的情况下,推荐系统可以从更丰富的隐式反馈中推断用户偏好,通过观察用户行为间接反映意见.

隐式反馈可以包括购买历史记录、浏览历史记录、搜索模式,甚至鼠标移动.

同样的评估技术是否适用于这里?例如RMSE、MSE.

重要的是要认识到,我们没有关于不喜欢哪些项目的可靠反馈.没有点击或购买可能与多种原因有关.我们也无法跟踪用户对我们推荐的反应.

因此,基于精度的指标(例如 RMSE 和 MSE)不是很合适,因为它们需要知道用户不喜欢哪些项目才有意义.>

但是,购买或点击某件商品就表明对它感兴趣.我不会说喜欢,因为根据推荐人的上下文,点击或购买可能具有不同的含义.

因此使面向回忆的措施适用于这种情况.因此,在这种情况下,引入了几个指标,最重要的是平均百分比排名 (MPR),也称为百分比排名.

MPR 值越低越好.随机预测的 MPR 期望值为 50%,因此 MPR > 50% 表示算法不比随机算法好.

当然,这不是评估具有隐式评分的推荐系统的唯一方法,但它是实践中最常用的方法.

有关此指标的更多信息,我建议您阅读上述论文.

好的,现在我们知道要使用什么了,但是 Apache Spark 呢?

Apache Spark 仍然没有为此指标提供开箱即用的实现,但希望不会持续太久.有一个 PR 等待验证 https://github.com/apache/spark/pull/16618 关于为 spark-ml 添加 RankingEvaluator.

不过实现并不复杂.你可以参考代码

我希望这能回答您的问题.

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?

To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.

What is implicit feedback ?

In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.

Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.

Do same evaluating techniques apply here? Such as RMSE, MSE.

It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.

Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.

However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.

So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.

Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.

Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.

For more information about this metric, I advise you to read the paper stated above.

Ok, now we know what we are going to use but what about Apache Spark?

Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator for spark-ml.

The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.

I hope this answers your question.

这篇关于如何在 Apache Spark 中评估隐式反馈 ALS 算法的推荐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆