基于Pandas DataFrame矩阵的计算 [英] Pandas DataFrame matrix based calculation

查看:954
本文介绍了基于Pandas DataFrame矩阵的计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas DataFrame,如下所示.它显示了用户在每个会话中如何访问页面p1至p4.

I have a Pandas DataFrame as following. It shows how users have accessed pages p1 to p4 in each session.

df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]])
df.columns = ['session','p1','p2','p3','p4']

以下是矩阵,该矩阵显示公共访问的页面的交集.

Following is the matrix which shows the intersection of pages accessed by common.

In [20]: df.dot(df.T)
Out[20]: 
session  1  2  3  4
session            
1        3  3  3  2
2        3  3  3  2
3        3  3  4  2
4        2  2  2  2

我需要根据以下公式计算一个值.

I need to calculate a value according to the following formula.

s1 = No of pages accessed in common/(total no of pages in si*total no of pages in sj)^(1/2)

这是会话1和2

No of pages accessed in common = 3
total no of pages in s1*total no of pages in s2 = 3*3
s1 = 3/9^(1/2) = 1

第2阶段和第4阶段

No of pages accessed in common = 2
total no of pages in s1*total no of pages in s2 = 3*2
s1 = 2/6^(1/2) = 0.8164

无法实现这一目标.

推荐答案

我认为您正在寻找

I think you are looking for numpy.outer:

In [10]: df1 = df.set_index('session')
         common = df1.dot(df1.T)

In [11]: df1.sum(1)
Out[11]: 
session
1          3
2          3
3          4
4          2
dtype: int64

In [12]: np.outer(*[df1.sum(1)] * 2)  # same as np.outer(df1.sum(1), df1.sum(1))
Out[12]: 
array([[ 9,  9, 12,  6],
       [ 9,  9, 12,  6],
       [12, 12, 16,  8],
       [ 6,  6,  8,  4]])

In [13]: np.sqrt(np.outer(*[df1.sum(1)] * 2))
Out[13]: 
array([[ 3.        ,  3.        ,  3.46410162,  2.44948974],
       [ 3.        ,  3.        ,  3.46410162,  2.44948974],
       [ 3.46410162,  3.46410162,  4.        ,  2.82842712],
       [ 2.44948974,  2.44948974,  2.82842712,  2.        ]])

In [14]: common / np.sqrt(np.outer(*[df1.sum(1)] * 2))
Out[14]: 
session         1         2         3         4
session                                        
1        1.000000  1.000000  0.866025  0.816497
2        1.000000  1.000000  0.866025  0.816497
3        0.866025  0.866025  1.000000  0.707107
4        0.816497  0.816497  0.707107  1.000000

这篇关于基于Pandas DataFrame矩阵的计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆