Mahout布尔型基于用户的推荐器的相似性功能 [英] Similarity function for Mahout boolean user-based recommender

查看:87
本文介绍了Mahout布尔型基于用户的推荐器的相似性功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Mahout 构建基于用户的推荐系统,该系统可以处理布尔数据.

I am using Mahout to build a user-based recommendation system which operates with boolean data.

我使用GenericBooleanPrefUserBasedRecommenderNearestNUserNeighborhood,现在尝试确定最合适的用户相似性功能.

I use GenericBooleanPrefUserBasedRecommender, NearestNUserNeighborhood and now trying to decide about the most suitable user similarity function.

建议使用LogLikelihoodSimilarityTanimotoCoefficientSimilarity.我同时尝试了这两种方法,并且在两种情况下都得到了[主观评估]有意义的结果.但是,相同数据集的RMSE评分更好.在两种情况下,不推荐"的数量相似.

It was suggested to use either LogLikelihoodSimilarity or TanimotoCoefficientSimilarity. I tried both and am getting [subjectively evaluated] meaningful results in both cases. However the RMSE rating for the same data set is better the LogLikehood. The number of "no recommendation" is similar in both case.

谁能推荐这些相似性函数中的哪一个最适合这种情况?

Can anyone recommend which of these similarity function is most suitable for this case?

推荐答案

(我是开发人员.)如果我被困在荒岛上,而该数据只有一个相似性度量标准而没有等级/偏好,那将是log-可能性.我通常希望它是更好的相似性指标.

(I'm the developer.) If I was stranded on a desert island with just one similarity metric for data without ratings/prefs, it would be log-likelihood. I would generally expect it to be the better similarity metric.

您正在执行的测试的问题是,也许一点也不明显,对于这种推荐器/数据没有意义. RMSE是均方根误差,并且它比较保留的测试数据的实际评级与预期评级.但是您没有评分.它们都是"1.0".真的,RMSE始终为0!

The problem with the test you're doing is that, perhaps not at all obviously, it's not meaningful for this kind of recommender / data. RMSE is root-mean-square-error, and it's comparing the actual vs predicted rating for held-out test data. But you have no ratings. They're all "1.0". Really, RMSE is always 0!

由于要推荐的东西不多,这些推荐者不会通过相似性的一些有意义的功能来对其进行排名.但是他们根本没有估计收视率/偏好.因此,RMSE意味着要蹲在这里.

It's not, since to have anything to rank on, these recommenders will rank by some meaningful function of the similarities. But they are not estimating ratings / prefs at all. So, RMSE means squat here.

我认为,在这种情况下,您真正​​可以使用的唯一指标是精度/召回测试.即使那样也是有问题的.我将无耻地推广这本书,其中涵盖了这个和更多有趣的主题:行动中的问题

The only metric you can really use is a precision/recall test in this case, I think. Even that is problematic. This and more fun topics are covered in a book which I will shamelessly promote: Mahout in Action

这篇关于Mahout布尔型基于用户的推荐器的相似性功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆