从data.frame或data.table建立方形邻接矩阵 [英] Build a square adjacency matrix from data.frame or data.table

查看：144 发布时间：2020/10/15 19:27:02 r data.table adjacency-matrix

本文介绍了从data.frame或data.table建立方形邻接矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从 data.table 建立一个方形邻接矩阵。
这是我已经拥有的可复制示例：

I am trying to build a square adjacency matrix from a data.table. Here is a reproducible example of what I already have :

require(data.table)
require(plyr)
require(reshape2)
# Build a mock data.table
dt <- data.table(Source=as.character(rep(letters[1:3],2)),Target=as.character(rep(letters[4:2],2)))
dt
#   Source Target
#1:      a      d
#2:      b      c
#3:      c      b
#4:      a      d
#5:      b      c
#6:      c      b
sry <- ddply(dt, .(Source,Target), summarize, Frequency=length(Source))
sry
#  Source Target Frequency
#1      a      d         2
#2      b      c         2
#3      c      b         2
mtx <- as.matrix(dcast(sry, Source ~ Target, value.var="Frequency", fill=0))
rownames(mtx) <- mtx[,1]
mtx <- mtx[,2:ncol(mtx)]
mtx
#  b   c   d
#a "0" "0" "2"
#b "0" "2" "0"
#c "2" "0" "0"

现在，这与我想要的非常接近，除了我想在两个维度上都表示所有节点，例如：

Now, this is very close to what I want to get, except that I would like to have all the nodes represented in both dimensions, like :

请注意，我正在处理相当大的数据，因此我想为此找到有效的解决方案。

Note that I am working on quite large data, so I'd like to find an efficient solution for this.

感谢您的帮助。

解决方案（编辑）：

给出在提供的解决方案质量和数据集大小方面，我对所有解决方案进行了基准测试。

Given the quality of the solutions offered and the size of my dataset, I benchmarked all the solutions.

#The bench was made with a 1-million-row sample from my original dataset
library(data.table)
aa <- fread("small2.csv",sep="^")
dt <- aa[,c(8,9),with=F]
colnames(dt) <- c("Source","Target")
dim(dt)
#[1] 1000001       2
levs <- unique(unlist(dt, use.names=F))
length(levs)
#[1] 2222

给出此数据，所需的输出为2222 * 2222矩阵（ 2222 * 2223解决方案，其中第一列包含行名称显然也是可以接受的。）

Given this data, the desired output is a 2222*2222 matrix (2222*2223 solutions where the first column contains the row names are also obviously acceptable).

# Ananda Mahto's first solution
am1 <- function() {
    table(dt[, lapply(.SD, factor, levs)])
}
dim(am1())
#[1] 2222 2222

# Ananda Mahto's second solution
am2 <- function() {
    as.matrix(dcast(dt[, lapply(.SD, factor, levs)], Source~Target, drop=F, value.var="Target", fun.aggregate=length))
}
dim(am2())
#[1] 2222 2223

library(dplyr)
library(tidyr)
# Akrun's solution
akr <- function() {
    dt %>%
       mutate_each(funs(factor(., levs))) %>%
       group_by(Source, Target) %>%
       tally() %>%
       spread(Target, n, drop=FALSE, fill=0)
}
dim(akr())
#[1] 2222 2223

library(igraph)
# Carlos Cinelli's solution
cc <- function() {
    g <- graph_from_data_frame(dt)
    as_adjacency_matrix(g)
}
dim(cc())
#[1] 2222 2222

基准测试的结果是……

library(rbenchmark)
benchmark(am1(), am2(), akr(), cc(), replications=75)
#    test replications elapsed relative user.self sys.self user.child sys.child
# 1 am1()           75  15.939    1.000    15.636    0.280          0         0
# 2 am2()           75 111.558    6.999   109.345    1.616          0         0
# 3 akr()           75  43.786    2.747    42.463    1.134          0         0
# 4  cc()           75  46.193    2.898    45.532    0.563          0         0

从data.frame或data.table建立方形邻接矩阵 [英] Build a square adjacency matrix from data.frame or data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从data.frame或data.table建立方形邻接矩阵 [英] Build a square adjacency matrix from data.frame or data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭