什么是错的“集体智慧编程”皮尔森算法? [英] What is wrong with the pearson algorithm from “Programming Collective Intelligence”?

查看:159
本文介绍了什么是错的“集体智慧编程”皮尔森算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个功能是从书集体智慧编程,而应该计算Pearson相关系数为P1和P2,这应该是一个介于-1到1。

如果两个影评率的项目非常相似的功能,应该返回1或接近1。

使用真实的用户数据我有时会有奇怪的结果。在下面的例子中,数据集critics2应该返回1 - 而是返回0

有没有人发现一个错误?

(这不是<一个副本href="http://stackoverflow.com/questions/1423525/what-is-wrong-with-this-python-function-from-programming-collective-intelligence">What是错的集体智慧编程这条巨蟒功能)

 从__future__进口师
从数学进口开方

高清sim_pearson(preFS,P1,P2):
    SI = {}
    在preFS [P1]项目:
        如果在preFS [P2]项目:SI [项目] = 1
    如果len(SI)== 0:返回0
    N = LEN(SI)
    SUM1 = SUM([preFS [P1] [它]它在SI])
    SUM2 = SUM([preFS [P2] [它]它在SI])
    sum1Sq = SUM([POW(preFS [P1] [是],2)它在SI])
    sum2Sq = SUM([POW(preFS [P2] [是],2)它在SI])
    PSUM = SUM([preFS [P1] [它] *在SI preFS [P2] [它]它])
    NUM = pSum-(SUM1 * SUM2 / N)
    书房=开方((sum1Sq-POW(sum1,2)/ N)*(sum2Sq-POW(sum2,2)/ N))
    如果den == 0:返回0
    R = num / den的
    回报 -  [R

评论家= {
    用户'user1':{
        ITEM1':3,
        项目2:5,
        'item3的':5,
        },
    '用户2:{
        ITEM1:4,
        项目2:5,
        'item3的':5,
        }
}
critics2 = {
    用户'user1':{
        ITEM1:5,
        项目2:5,
        'item3的':5,
        },
    '用户2:{
        ITEM1:5,
        项目2:5,
        'item3的':5,
        }
}
critics3 = {
    用户'user1':{
        ITEM1:1,
        项目2:3,
        'item3的':5,
        },
    '用户2:{
        ITEM1:5,
        项目2:3,
        项目3:1,
        }
}

打印sim_pearson(评论家,用户1,用户2,)
结果:1.0(预期)
打印sim_pearson(critics2,用户1,用户2,)
结果:0(意外)
打印sim_pearson(critics3,用户1,用户2,)
结果:1(预期)
 

解决方案

没有什么错在你的结果。您正在试图通过3点绘制一条线。在第二种情况下,你有三个点相同的坐标,即有效一点。你不能说做这些点关联或反相关,因为你可以通过一个点(在code书房等于零)画线的无限多

This function is from the book "Programming Collective Intelligence", and is supposed to calculate the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1.

If two critics rate items very similarly the function should return 1, or close to 1.

With real user data I sometimes get weird results. In the following example the dataset critics2 should return 1 - instead it returns 0.

Does anyone spot a mistake?

(This is not a duplicate of What is wrong with this python function from "Programming Collective Intelligence")

from __future__ import division
from math import sqrt

def sim_pearson(prefs,p1,p2):
    si={}
    for item in prefs[p1]: 
        if item in prefs[p2]: si[item]=1
    if len(si)==0: return 0
    n=len(si)
    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) 
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
    num=pSum-(sum1*sum2/n)
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
    if den==0: return 0
    r=num/den
    return r

critics = {
    'user1':{
        'item1': 3,
        'item2': 5,
        'item3': 5,
        },
    'user2':{
        'item1': 4,
        'item2': 5,
        'item3': 5,
        }
}
critics2 = {
    'user1':{
        'item1': 5,
        'item2': 5,
        'item3': 5,
        },
    'user2':{
        'item1': 5,
        'item2': 5,
        'item3': 5,
        }
}
critics3 = {
    'user1':{
        'item1': 1,
        'item2': 3,
        'item3': 5,
        },
    'user2':{
        'item1': 5,
        'item2': 3,
        'item3': 1,
        }
}

print sim_pearson(critics, 'user1', 'user2', )
result: 1.0 (expected)
print sim_pearson(critics2, 'user1', 'user2', )
result: 0 (unexpected)
print sim_pearson(critics3, 'user1', 'user2', )
result: -1 (expected)

解决方案

There is nothing wrong in your result. You are trying to plot a line through 3 points. In second case you have all three points with the same coordinates, i.e. effectively one point. You can't say do these points correlate or anti-correlate, because you can draw infinite number of lines through one point (den in your code equals to zero).

这篇关于什么是错的“集体智慧编程”皮尔森算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆