基于Pandas DataFrame矩阵的计算 [英] Pandas DataFrame matrix based calculation
本文介绍了基于Pandas DataFrame矩阵的计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个Pandas DataFrame,如下所示.它显示了用户在每个会话中如何访问页面p1至p4.
I have a Pandas DataFrame as following. It shows how users have accessed pages p1 to p4 in each session.
df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]])
df.columns = ['session','p1','p2','p3','p4']
以下是矩阵,该矩阵显示公共访问的页面的交集.
Following is the matrix which shows the intersection of pages accessed by common.
In [20]: df.dot(df.T)
Out[20]:
session 1 2 3 4
session
1 3 3 3 2
2 3 3 3 2
3 3 3 4 2
4 2 2 2 2
我需要根据以下公式计算一个值.
I need to calculate a value according to the following formula.
s1 = No of pages accessed in common/(total no of pages in si*total no of pages in sj)^(1/2)
这是会话1和2
No of pages accessed in common = 3
total no of pages in s1*total no of pages in s2 = 3*3
s1 = 3/9^(1/2) = 1
第2阶段和第4阶段
No of pages accessed in common = 2
total no of pages in s1*total no of pages in s2 = 3*2
s1 = 2/6^(1/2) = 0.8164
无法实现这一目标.
推荐答案
I think you are looking for numpy.outer
:
In [10]: df1 = df.set_index('session')
common = df1.dot(df1.T)
In [11]: df1.sum(1)
Out[11]:
session
1 3
2 3
3 4
4 2
dtype: int64
In [12]: np.outer(*[df1.sum(1)] * 2) # same as np.outer(df1.sum(1), df1.sum(1))
Out[12]:
array([[ 9, 9, 12, 6],
[ 9, 9, 12, 6],
[12, 12, 16, 8],
[ 6, 6, 8, 4]])
In [13]: np.sqrt(np.outer(*[df1.sum(1)] * 2))
Out[13]:
array([[ 3. , 3. , 3.46410162, 2.44948974],
[ 3. , 3. , 3.46410162, 2.44948974],
[ 3.46410162, 3.46410162, 4. , 2.82842712],
[ 2.44948974, 2.44948974, 2.82842712, 2. ]])
In [14]: common / np.sqrt(np.outer(*[df1.sum(1)] * 2))
Out[14]:
session 1 2 3 4
session
1 1.000000 1.000000 0.866025 0.816497
2 1.000000 1.000000 0.866025 0.816497
3 0.866025 0.866025 1.000000 0.707107
4 0.816497 0.816497 0.707107 1.000000
这篇关于基于Pandas DataFrame矩阵的计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文