什么是错的“集体智慧编程”皮尔森算法？ [英] What is wrong with the pearson algorithm from “Programming Collective Intelligence”?

查看：159 发布时间：2015/11/30 20:49:43 python algorithm pearson

本文介绍了什么是错的“集体智慧编程”皮尔森算法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个功能是从书集体智慧编程，而应该计算Pearson相关系数为P1和P2，这应该是一个介于-1到1。

如果两个影评率的项目非常相似的功能，应该返回1或接近1。

使用真实的用户数据我有时会有奇怪的结果。在下面的例子中，数据集critics2应该返回1 - 而是返回0

。

有没有人发现一个错误？

（这不是<一个副本href="http://stackoverflow.com/questions/1423525/what-is-wrong-with-this-python-function-from-programming-collective-intelligence">What是错的集体智慧编程这条巨蟒功能）

 从__future__进口师
从数学进口开方

高清sim_pearson（preFS，P1，P2）：
    SI = {}
    在preFS [P1]项目：
        如果在preFS [P2]项目：SI [项目] = 1
    如果len（SI）== 0：返回0
    N = LEN（SI）
    SUM1 = SUM（[preFS [P1] [它]它在SI]）
    SUM2 = SUM（[preFS [P2] [它]它在SI]）
    sum1Sq = SUM（[POW（preFS [P1] [是]，2）它在SI]）
    sum2Sq = SUM（[POW（preFS [P2] [是]，2）它在SI]）
    PSUM = SUM（[preFS [P1] [它] *在SI preFS [P2] [它]它]）
    NUM = pSum-（SUM1 * SUM2 / N）
    书房=开方（（sum1Sq-POW（sum1,2）/ N）*（sum2Sq-POW（sum2,2）/ N））
    如果den == 0：返回0
    R = num / den的
    回报 -  [R

评论家= {
    用户'user1'：{
        ITEM1'：3，
        项目2：5，
        'item3的'：5，
        }，
    '用户2：{
        ITEM1：4，
        项目2：5，
        'item3的'：5，
        }
}
critics2 = {
    用户'user1'：{
        ITEM1：5，
        项目2：5，
        'item3的'：5，
        }，
    '用户2：{
        ITEM1：5，
        项目2：5，
        'item3的'：5，
        }
}
critics3 = {
    用户'user1'：{
        ITEM1：1，
        项目2：3，
        'item3的'：5，
        }，
    '用户2：{
        ITEM1：5，
        项目2：3，
        项目3：1，
        }
}

打印sim_pearson（评论家，用户1，用户2，）
结果：1.0（预期）
打印sim_pearson（critics2，用户1，用户2，）
结果：0（意外）
打印sim_pearson（critics3，用户1，用户2，）
结果：1（预期）

解决方案

没有什么错在你的结果。您正在试图通过3点绘制一条线。在第二种情况下，你有三个点相同的坐标，即有效一点。你不能说做这些点关联或反相关，因为你可以通过一个点（在code书房等于零）画线的无限多

This function is from the book "Programming Collective Intelligence", and is supposed to calculate the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1.

If two critics rate items very similarly the function should return 1, or close to 1.

With real user data I sometimes get weird results. In the following example the dataset critics2 should return 1 - instead it returns 0.

Does anyone spot a mistake?

(This is not a duplicate of What is wrong with this python function from "Programming Collective Intelligence")

from __future__ import division
from math import sqrt

def sim_pearson(prefs,p1,p2):
    si={}
    for item in prefs[p1]: 
        if item in prefs[p2]: si[item]=1
    if len(si)==0: return 0
    n=len(si)
    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) 
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
    num=pSum-(sum1*sum2/n)
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
    if den==0: return 0
    r=num/den
    return r

critics = {
    'user1':{
        'item1': 3,
        'item2': 5,
        'item3': 5,
        },
    'user2':{
        'item1': 4,
        'item2': 5,
        'item3': 5,
        }
}
critics2 = {
    'user1':{
        'item1': 5,
        'item2': 5,
        'item3': 5,
        },
    'user2':{
        'item1': 5,
        'item2': 5,
        'item3': 5,
        }
}
critics3 = {
    'user1':{
        'item1': 1,
        'item2': 3,
        'item3': 5,
        },
    'user2':{
        'item1': 5,
        'item2': 3,
        'item3': 1,
        }
}

print sim_pearson(critics, 'user1', 'user2', )
result: 1.0 (expected)
print sim_pearson(critics2, 'user1', 'user2', )
result: 0 (unexpected)
print sim_pearson(critics3, 'user1', 'user2', )
result: -1 (expected)

解决方案

There is nothing wrong in your result. You are trying to plot a line through 3 points. In second case you have all three points with the same coordinates, i.e. effectively one point. You can't say do these points correlate or anti-correlate, because you can draw infinite number of lines through one point (den in your code equals to zero).

这篇关于什么是错的“集体智慧编程”皮尔森算法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

什么是错的“集体智慧编程”皮尔森算法？ [英] What is wrong with the pearson algorithm from “Programming Collective Intelligence”?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

什么是错的“集体智慧编程”皮尔森算法？ [英] What is wrong with the pearson algorithm from “Programming Collective Intelligence”?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭