Mahout:调整基于项目的推荐者的余弦相似度 [英] Mahout: adjusted cosine similarity for item based recommender

查看:360
本文介绍了Mahout:调整基于项目的推荐者的余弦相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于作业,我应该测试不同类型的推荐者,我必须先实施。我一直在寻找一个好的图书馆(我最初考虑过Weka)并且偶然发现了Mahout。

For an assignment I'm supposed to test different types of recommenders, which I have to implement first. I've been looking around for a good library to do that (I had thought about Weka at first) and stumbled upon Mahout.

因此我必须提出:a )我对Mahout完全不熟悉b)我没有强大的推荐人背景和他们的算法(否则我不会这样做......)和c)对不起但我不是最好的开发者在世界上==>我很感激你是否可以使用外行术语(尽可能......):)

I must therefore put forward that: a) I'm completely new to Mahout b) I do not have a strong background in recommenders nor their algorithms (otherwise I wouldn't be doing this class...) and c) sorry but I'm far from being the best developper in the world ==> I'd appreciate if you could use layman terms (as far as possible...) :)

我一直在关注一些教程(例如这个,以及 part2 )并获得了基于项目和基于用户的推荐人的初步结果。

I've been following some tutorials (e.g. this, as well as part2) and got some preliminary results on item-based and user-based recommenders.

但是,我对基于项目的预测不满意。到目前为止,我只发现了相似性函数没有考虑用户的评级偏差。我想知道是否有类似调整后的余弦相似度。任何提示?

However, I'm not very happy with the item-based prediction. So far, I've only found similarity functions that do not take into consideration the users' rating-biases. I was wondering if there is something like adjusted cosine similarity. Any hints?

推荐答案

以下是我创建的 AdjustedCosineSimilarity 的示例。您必须记住,由于sqrt计算,这将比 PearsonCorrelationSimilarity 慢,但会产生更好的结果。至少对我的数据集来说,结果要好得多。但是你应该做出权衡,质量/性能,并根据你的需要你应该使用你想要的实现。

Here is a sample of the AdjustedCosineSimilarity I created. You must remember that this will be slower than PearsonCorrelationSimilarity because of the sqrt computations, but will produce better results. At least for my dataset results were much better. But you should make a trade off, quality/performance, and depending of your needs you should use the implementation you want.

/**
 * Custom implementation of {@link AdjustedCosineSimilarity}
 * 
 * @author dmilchevski
 *
 */
public class AdjustedCosineSimilarity extends AbstractSimilarity {

  /**
   * Creates new {@link AdjustedCosineSimilarity}
   * 
   * @param dataModel
   * @throws TasteException
   */
    public AdjustedCosineSimilarity(DataModel dataModel)
            throws TasteException {
        this(dataModel, Weighting.UNWEIGHTED);
    }

    /**
     * Creates new {@link AdjustedCosineSimilarity}
     * 
     * @param dataModel
     * @param weighting
     * @throws TasteException
     */
    public AdjustedCosineSimilarity(DataModel dataModel, Weighting weighting)
            throws TasteException {
        super(dataModel, weighting, true);
        Preconditions.checkArgument(dataModel.hasPreferenceValues(),
                "DataModel doesn't have preference values");
    }

    /**
     * Compute the result
     */
    @Override
    double computeResult(int n, double sumXY, double sumX2, double sumY2, double sumXYdiff2) {
        if (n == 0) {
            return Double.NaN;
        }
        // Note that sum of X and sum of Y don't appear here since they are
        // assumed to be 0;
        // the data is assumed to be centered.
        double denominator = Math.sqrt(sumX2) * Math.sqrt(sumY2);
        if (denominator == 0.0) {
            // One or both parties has -all- the same ratings;
            // can't really say much similarity under this measure
            return Double.NaN;
        }
        return sumXY / denominator;
    }

    /**
     * Gets the average preference
     * @param prefs
     * @return
     */
    private double averagePreference(PreferenceArray prefs){
        double sum = 0.0;
        int n = prefs.length();
        for(int i=0; i<n; i++){
            sum+=prefs.getValue(i);
        }
        if(n>0){
            return sum/n;
        }
        return 0.0d;
    }

    /**
     * Compute the item similarity between two items
     */
    @Override
    public double itemSimilarity(long itemID1, long itemID2) throws TasteException {
        DataModel dataModel = getDataModel();
        PreferenceArray xPrefs = dataModel.getPreferencesForItem(itemID1);
        PreferenceArray yPrefs = dataModel.getPreferencesForItem(itemID2);
        int xLength = xPrefs.length();
        int yLength = yPrefs.length();

        if (xLength == 0 || yLength == 0) {
            return Double.NaN;
        }

        long xIndex = xPrefs.getUserID(0);
        long yIndex = yPrefs.getUserID(0);
        int xPrefIndex = 0;
        int yPrefIndex = 0;

        double sumX = 0.0;
        double sumX2 = 0.0;
        double sumY = 0.0;
        double sumY2 = 0.0;
        double sumXY = 0.0;
        double sumXYdiff2 = 0.0;
        int count = 0;

        // No, pref inferrers and transforms don't appy here. I think.

        while (true) {
            int compare = xIndex < yIndex ? -1 : xIndex > yIndex ? 1 : 0;
            if (compare == 0) {
                // Both users expressed a preference for the item
                double x = xPrefs.getValue(xPrefIndex);
                double y = yPrefs.getValue(yPrefIndex);
                long xUserId = xPrefs.getUserID(xPrefIndex);
                long yUserId = yPrefs.getUserID(yPrefIndex);

                double xMean = averagePreference(dataModel.getPreferencesFromUser(xUserId));
                double yMean = averagePreference(dataModel.getPreferencesFromUser(yUserId));

                sumXY += (x - xMean) * (y - yMean);
                sumX += x;
                sumX2 += (x - xMean) * (x - xMean);
                sumY += y;
                sumY2 += (y - yMean) * (y - yMean);
                double diff = x - y;
                sumXYdiff2 += diff * diff;
                count++;
            }
            if (compare <= 0) {
                if (++xPrefIndex == xLength) {
                    break;
                }
                xIndex = xPrefs.getUserID(xPrefIndex);
            }
            if (compare >= 0) {
                if (++yPrefIndex == yLength) {
                    break;
                }
                yIndex = yPrefs.getUserID(yPrefIndex);
            }
        }

        double result;

        // See comments above on these computations
        double n = (double) count;
        double meanX = sumX / n;
        double meanY = sumY / n;
        // double centeredSumXY = sumXY - meanY * sumX - meanX * sumY + n *
        // meanX * meanY;
        double centeredSumXY = sumXY - meanY * sumX;
        // double centeredSumX2 = sumX2 - 2.0 * meanX * sumX + n * meanX *
        // meanX;
        double centeredSumX2 = sumX2 - meanX * sumX;
        // double centeredSumY2 = sumY2 - 2.0 * meanY * sumY + n * meanY *
        // meanY;
        double centeredSumY2 = sumY2 - meanY * sumY;
//      result = computeResult(count, centeredSumXY, centeredSumX2,
//              centeredSumY2, sumXYdiff2);

        result = computeResult(count, sumXY, sumX2, sumY2, sumXYdiff2);

        if (!Double.isNaN(result)) {
            result = normalizeWeightResult(result, count,
                    dataModel.getNumUsers());
        }
        return result;
    }

}

这篇关于Mahout:调整基于项目的推荐者的余弦相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆