Apache Mahout +欧几里德距离:意外结果 [英] Apache Mahout + Euclidean Distance: Unexpected Results

查看：139 发布时间：2020/5/5 11:16:24 mahout euclidean-distance

本文介绍了Apache Mahout +欧几里德距离:意外结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

鉴于以下用户偏好数据集，我正在使用Mahout的EuclideanDistanceSimilarity类对多个用户的相似性进行排名.首选项的范围目前是1到5之间的所有整数.但是，我可以控制秤，如果有帮助，可以更改.

I'm using Mahout's EuclideanDistanceSimilarity class to rank the similarity of several users given the following data set of user preferences. The range for preferences is currently all integers from 1 to 5 inclusive. However I have control over the scale, so that can change if it would help.

User    Preferences:
        Item 1    Item 2    Item 3    Item 4    Item 5    Item 6
 1       2         4         3         5         1         2
 2       5         1         5         1         5         1
 3       1         5         1         5         1         5
 4       2         4         3         5         1         2
 5       3         3         4         5         2         2

运行以下测试代码时，我得到了意外的结果，该代码已添加到此处的Test类中:

I'm getting unexpected results when I run the following test code, which I added to the Test class found here: http://www.massapi.com/source/mahout-distribution-0.4/core/src/test/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarityTest.java.html

@Test
public void testSimple2() throws Exception {
    DataModel dataModel = getDataModel(
            new long[]{1, 2, 3, 4, 5},
            new Double[][]{
                {2.0, 4.0, 3.0, 5.0, 1.0, 2.0},
                {5.0, 1.0, 5.0, 1.0, 5.0, 1.0},
                {1.0, 5.0, 1.0, 5.0, 1.0, 5.0},
                {2.0, 4.0, 3.0, 5.0, 1.0, 2.0},
                {3.0, 3.0, 4.0, 5.0, 2.0, 2.0},});
    for (int i = 1; i <= 5; i++) {
        for (int j = 1; j <= 5; j++) {
            System.out.println( i + "," + j + ": " + new EuclideanDistanceSimilarity(dataModel).userSimilarity(i, j));
        }
    }
}

它产生以下结果:

1,1: 1.0
1,2: 0.7129109430106292
1,3: 1.0
1,4: 1.0
1,5: 1.0
2,1: 0.7129109430106292
2,2: 1.0
2,3: 0.5556605665978556
2,4: 0.7129109430106292
2,5: 0.8675434911352263
3,1: 1.0
3,2: 0.5556605665978556
3,3: 1.0
3,4: 1.0
3,5: 0.9683428667784535
4,1: 1.0
4,2: 0.7129109430106292
4,3: 1.0
4,4: 1.0
4,5: 1.0
5,1: 1.0
5,2: 0.8675434911352263
5,3: 0.9683428667784535
5,4: 1.0
5,5: 1.0

有人可以帮助我了解我在这里做错了什么吗?显然，用户1的首选项与用户3& 2不同. 5，为什么我的相似度为1.0?

Would someone please help me understand what I'm doing wrong here? Clearly, user 1's preferences are not identical to users 3 & 5, so why do I get 1.0 for the similarity?

如果欧几里得无法使用，我愿意使用其他算法，但是Pearson不适用于我，因为我需要处理对每个项目都提交相同首选项的用户，并且我不希望对成绩"进行更正通货膨胀."

I'm open to using a different algorithm if Euclidean won't work, however Pearson doesn't work for me because I need to handle users that submit identical preferences for each item and I do not want to correct for "grade inflation."

Apache Mahout +欧几里德距离:意外结果 [英] Apache Mahout + Euclidean Distance: Unexpected Results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Mahout +欧几里德距离:意外结果 [英] Apache Mahout + Euclidean Distance: Unexpected Results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭