提升值计算 [英] Lift value calculation
本文介绍了提升值计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个(对称)邻接矩阵,该矩阵是根据报纸文章(例如:a,b,c,d)中名字(例如:Greg,Mary,Sam,Tom)的同现而创建的.见下文.
I have a (symmetric) adjacency matrix, which has been created based on the co-occurence of names (e.g.: Greg, Mary, Sam, Tom) in newspaper articles (e.g.: a,b,c,d). See below.
如何计算非零矩阵元素的提升值(我会对有效的实现感兴趣,该实现也可以用于非常大的矩阵(例如,一百万个非零元素).
I would be interested in an efficient implementation, which could also be used for very large matrices (e.g. a million non-zero elements).
感谢您的帮助.
# Load package
library(Matrix)
# Data
A <- new("dgTMatrix"
, i = c(2L, 2L, 2L, 0L, 3L, 3L, 3L, 1L, 1L)
, j = c(0L, 1L, 2L, 0L, 1L, 2L, 3L, 1L, 3L)
, Dim = c(4L, 4L)
, Dimnames = list(c("Greg", "Mary", "Sam", "Tom"), c("a", "b", "c", "d"))
, x = c(1, 1, 1, 1, 1, 1, 1, 1, 1)
, factors = list()
)
# > A
# 4 x 4 sparse Matrix of class "dgTMatrix"
# a b c d
# Greg 1 . . .
# Mary . 1 . 1
# Sam 1 1 1 .
# Tom . 1 1 1
# One mode projection of the data
# (i.e. final adjacency matrix, which is the basis for the lift value calculation)
A.final <- tcrossprod(A)
# > A.final
# 4 x 4 sparse Matrix of class "dsCMatrix"
# Greg Mary Sam Tom
# Greg 1 . 1 .
# Mary . 2 1 2
# Sam 1 1 3 2
# Tom . 2 2 3
推荐答案
以下内容可能会对您有所帮助,但肯定不是最有效的实现.
Here is something that might help you but for sure is not the most efficient implementation.
ComputeLift <- function(data, projection){
# Initialize a matrix to store the results.
lift <- matrix(NA, nrow=nrow(projection), ncol=ncol(projection))
# Select all pairs in the projection matrix
for(i in 1:nrow(projection)){
for(j in 1:ncol(projection)){
# The probability to observe both names in the same article is the
# number of articles where the names appear together divided by the
# total number of articles
pAB <- projection[i,j]/ncol(data)
# The probability for a name to appear in an article is the number of
# articles where the name appears divided by the total number of articles
pA <- sum(A[i,])/ncol(data)
pB <- sum(A[j,])/ncol(data)
# The lift is computed as the probability to observe both names in an
# article divided by the product of the probabilities to observe each name.
lift[i,j] <- pAB/(pA*pB)
}
}
lift
}
ComputeLift(data=A, projection=A.final)
这篇关于提升值计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文