创建共现矩阵 [英] Creating co-occurrence matrix

查看:192
本文介绍了创建共现矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决具有共现矩阵的问题.我有一个交易和项目的数据文件,我想查看一个项目一起出现的交易数量的矩阵.

我是R编程的新手,我很有趣地发现了R的所有快捷方式,而不是创建特定的循环(我几年前曾经使用C,现在只坚持使用Excel宏和SPSS) .我已经在这里检查了解决方案,但是没有找到一个可行的解决方案(最接近的解决方案是这里给出的解决方案:

如上所述,cbind可能不成功,因此projecting_tm无法给我任何结果.

是否有其他替代方法或对我的方法有更正?

非常感谢您的帮助!

解决方案

我将结合使用reshape2包和矩阵代数:

#read in your data
dat <- read.table(text="TrxID Items Quant
Trx1 A 3
Trx1 B 1
Trx1 C 1
Trx2 E 3
Trx2 B 1
Trx3 B 1
Trx3 C 4
Trx4 D 1
Trx4 E 1
Trx4 A 1
Trx5 F 5
Trx5 B 3
Trx5 C 2
Trx5 D 1", header=T)

#making the boolean matrix   
library(reshape2)
dat2 <- melt(dat)
w <- dcast(dat2, Items~TrxID)
x <- as.matrix(w[,-1])
x[is.na(x)] <- 0
x <- apply(x, 2,  function(x) as.numeric(x > 0))  #recode as 0/1
v <- x %*% t(x)                                   #the magic matrix 
diag(v) <- 0                                      #repalce diagonal
dimnames(v) <- list(w[, 1], w[,1])                #name the dimensions
v

也许是图形...

g <- graph.adjacency(v, weighted=TRUE, mode ='undirected')
g <- simplify(g)
# set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree <- degree(g)
plot(g)

I'm trying to solve the problem of having a co-occurence matrix. I have a datafile of transactions and items, and I want to see a matrix of the number of transactions where items appear together.

I'm a newbie in R programming and I'm having some fun finding out all the shortcuts that R has, rather than creating specific loops (I used to use C years ago and only sticking to Excel macros and SPSS now). I have checked the solutions here, but haven't found one that works (the closest is the solution given here: Co-occurrence matrix using SAC? - but it produced an error message when I used projecting_tm, I suspected that the cbind wasn't successful in my case.

Essentially I have a table containing the following:

TrxID Items Quant
Trx1 A 3
Trx1 B 1
Trx1 C 1
Trx2 E 3
Trx2 B 1
Trx3 B 1
Trx3 C 4
Trx4 D 1
Trx4 E 1
Trx4 A 1
Trx5 F 5
Trx5 B 3
Trx5 C 2
Trx5 D 1, etc.

I want to create something like:

   A B C D E F
A  0 1 1 0 1 1
B  1 0 3 1 1 0
C  1 3 0 1 0 0
D  1 1 1 0 1 1
E  1 1 0 1 0 0
F  0 1 1 1 0 0

What I did was (and you'd probably laugh at my rookie R approach):

library(igraph)
library(tnet)

trx <- read.table("FileName.txt", header=TRUE) 
transID <- t(trx[1])
items <- t(trx[2])

id_item <- cbind(items,transID)
item_item <- projecting_tm(id_item, method="sum")
item_item <- tnet_igraph(item_item,type="weighted one-mode tnet")
item_matrix <-get.adjacency(item_item,attr="weight")
item_matrix

As mentioned above the cbind was probably unsuccessful, so the projecting_tm couldn't give me any result.

Any alternative approach or a correction to my method?

Your help would be much appreciated!

解决方案

I'd use a combination of the reshape2 package and matrix algebra:

#read in your data
dat <- read.table(text="TrxID Items Quant
Trx1 A 3
Trx1 B 1
Trx1 C 1
Trx2 E 3
Trx2 B 1
Trx3 B 1
Trx3 C 4
Trx4 D 1
Trx4 E 1
Trx4 A 1
Trx5 F 5
Trx5 B 3
Trx5 C 2
Trx5 D 1", header=T)

#making the boolean matrix   
library(reshape2)
dat2 <- melt(dat)
w <- dcast(dat2, Items~TrxID)
x <- as.matrix(w[,-1])
x[is.na(x)] <- 0
x <- apply(x, 2,  function(x) as.numeric(x > 0))  #recode as 0/1
v <- x %*% t(x)                                   #the magic matrix 
diag(v) <- 0                                      #repalce diagonal
dimnames(v) <- list(w[, 1], w[,1])                #name the dimensions
v

For the graphing maybe...

g <- graph.adjacency(v, weighted=TRUE, mode ='undirected')
g <- simplify(g)
# set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree <- degree(g)
plot(g)

这篇关于创建共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆