mahout的推荐评估者的工作方式 [英] How mahout's recommendation evaluator works

查看:168
本文介绍了mahout的推荐评估者的工作方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我mahout的RecommenderIRStatsEvaluator如何工作?更具体地说,它如何随机分割训练和测试数据,以及将结果与哪些数据进行比较?根据我的轻描淡写,您需要某种理想/预期结果,您需要将其与推荐算法的实际结果进行比较以找出TP或FP 从而计算精度或召回率.但是看起来mahout提供的精度/召回得分没有理想/结果.

Can anyone tell me how does mahout's RecommenderIRStatsEvaluator work? More specifically how it randomly splits training and testing data and what data the result is compare against? Based on my understating, you need some sort of ideal/expected result which you need to compare against actual result from the recommendation algorithm to find out TP or FP and thus compute precision or recall. But it looks like mahout provides a precision/recall score without that ideal/result.

推荐答案

使用您在RecommenderIRStatsEvaluator类的evaluate方法中提供的一些相关性阈值,将数据分为训练和测试集.如果此值为null,则有一种计算方法(computeThreshold).将数据分为训练和测试的类是GenericRelevantItemsDataSplitter.如果看一下代码,您会发现首先针对每个用户的首选项按照值的降序排列,然后,只有那些值大于relevanceThreshold的首选项才是相关的.另外请注意,最多at被放入该集合中.

The data is split into training and testing set using some relevance threshold value which you supply in the evaluate method of the RecommenderIRStatsEvaluator class. If this values is null there is method that computes it (computeThreshold). The class that splits the data into training and testing is GenericRelevantItemsDataSplitter. If you take a look into the code you can see that first the preferences for each user are sorted according the the value in descending order, and than only those that have value bigger than the relevanceThreshold are taken as relevant. Also notice that at most at are put into this set.

@Override
  public FastIDSet getRelevantItemsIDs(long userID,
                                       int at,
                                       double relevanceThreshold,
                                       DataModel dataModel) throws TasteException {
    PreferenceArray prefs = dataModel.getPreferencesFromUser(userID);
    FastIDSet relevantItemIDs = new FastIDSet(at);
    prefs.sortByValueReversed();
    for (int i = 0; i < prefs.length() && relevantItemIDs.size() < at; i++) {
      if (prefs.getValue(i) >= relevanceThreshold) {
        relevantItemIDs.add(prefs.getItemID(i));
      }
    }
    return relevantItemIDs;
  }

您可以在RecommenderIRStatsEvaluator.evaluate方法中看到如何计算精度和召回率.简而言之是这样的: 首先,一次仅评估一个用户.他的偏好值分为相关的(如上所述)和其他.相关的用作测试集,另一个与所有其他用户一起用作培训.然后为该用户生成top-at推荐.接下来,该方法将查看推荐中是否包含作为测试集遗留的某些项目,以及有多少:

How the precision and the recall are computed you can see in the RecommenderIRStatsEvaluator.evaluate method. In short it is like this: First only one user is evaluated at a time. His preference values are split into relevant (as described above) and other. The relevant ones are used as test set, and the other together with all other users as training. Then top-at recommendations are produced for this user. Next the method looks whether some of the items that were taken aside as test set appear in the recommendation, and how many:

int intersectionSize = 0;
      List<RecommendedItem> recommendedItems = recommender.recommend(userID, at, rescorer);
      for (RecommendedItem recommendedItem : recommendedItems) {
        if (relevantItemIDs.contains(recommendedItem.getItemID())) {
          intersectionSize++;
        }
  }

精确度的计算方法如下:

The precision than is computed as follows:

(double) intersectionSize / (double) numRecommendedItems

如果推荐者至少产生at条推荐,则numRecommendedItems通常是您的at,否则会更小.

Where numRecommendedItems is usually your at if the recommender produces at least at recommendations, otherwise smaller.

类似,召回率的计算方法如下:

Similar, the recall is computed as follows:

(double) intersectionSize / (double) numRelevantItems

其中numRelevantItems是该用户在测试集中的项目数.

where numRelevantItems is the number of items in the test set for this user.

最终精度和召回率是所有用户所有精度和召回率的宏观平均值.

The final precision and recall are macro average of all precisions and recalls for all users.

希望这能回答您的问题.

Hope this answers your question.

编辑:要继续您的问题,在评估推荐系统的IR统计信息(精确度和召回率)时,这非常棘手,特别是如果每​​个用户的用户首选项数量很少时.在这本中,您可以找到非常有用的见解.它说

To continue with your question, it is very tricky when evaluating IR statistics (precision and recall) for recommender systems, especially if you have small number of user preferences per user. In this book you can find very useful insights. It says that

通常假定不喜欢的项目不会被 即使被推荐也很喜欢,也就是说,对于用户而言,它们是无趣的或无用的.但是,这可能不是正确的,因为这组不喜欢的项目可能包含用户未选择的一些有趣的项目.例如,用户可能不喜欢某个项目,因为他不知道该项目的存在,但是在推荐暴露该项目之后,用户可以决定选择该项目.在任何情况下,使用IR统计信息时,FP的数量都会过高估计.

it is typically assumed that the not liked items would have not been liked even if they had been recommended i.e they are uninteresting or useless for the user. However, this may not be true, because the set of not liked items may contains some interesting items that the user did not select. For example, a user may not have liked an item because he was unaware of its existence, but after the recommendation exposed that item, the user can decide to select it. In any case when using IR statistics, the number of the FP is over estimated.

这篇关于mahout的推荐评估者的工作方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆