r从数据帧中的列创建邻接矩阵 [英] r creating an adjacency matrix from columns in a dataframe
问题描述
我对测试某些网络可视化技术感兴趣,但是在尝试这些功能之前,我想使用如下数据框构建邻接矩阵(从,到).
I am interested in testing some network visualization techniques but before trying those functions I want to build an adjacency matrix (from, to) using the dataframe which is as follows.
Id Gender Col_Cold_1 Col_Cold_2 Col_Cold_3 Col_Hot_1 Col_Hot_2 Col_Hot_3
10 F pain sleep NA infection medication walking
14 F Bump NA muscle NA twitching flutter
17 M pain hemoloma Callus infection
18 F muscle pain twitching medication
我的目标是创建如下的邻接矩阵
My goal is to create an adjacency matrix as follows
1) All values in columns with keyword Cold will contribute to the rows
2) All values in columns with keyword Hot will contribute to the columns
例如,pain, sleep, Bump, muscle, hemaloma
是带有关键字冷的列下的单元格值,它们将构成行,而infection, medication, Callus, walking, twitching, flutter
这样的单元格值将位于带有关键字 Hot 的列下strong>,这将形成关联矩阵的列.
For example, pain, sleep, Bump, muscle, hemaloma
are cell values under the columns with keyword Cold and they will form the rows and cell values such as infection, medication, Callus, walking, twitching, flutter
are under columns with keywords Hot and this will form the columns of the association matrix.
最终所需的输出应如下所示:
The final desired output should appear like this:
infection medication walking twitching flutter Callus
pain 2 2 1 1 1
sleep 1 1 1
Bump 1 1
muscle 1 1
hemaloma 1 1
-
[pain, infection]
= 2是因为疼痛和感染之间的关联在原始数据帧中发生了两次:在第1行一次,在第3行一次.[pain, infection]
= 2 because the association between pain and infection occurs twice in the original dataframe: once in row 1 and again in row 3.[pain, medication]
= 2是因为疼痛和药物之间的关联在第1行中出现了两次,在第4行中又发生了一次.[pain, medication]
=2 because association between pain and medication occurs twice once in row 1 and again in row 4.非常感谢您提供有关产生这种关联矩阵的任何建议或建议.
Any suggestions or advice on producing such an association matrix is much appreciated thanks.
可复制数据集
df = structure(list(id = c(10, 14, 17, 18), Gender = structure(c(1L, 1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), Col_Cold_1 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "Bump", "muscle", "pain"), class = "factor"), Col_Cold_2 = structure(c(4L, 2L, 3L, 1L), .Label = c("", "NA", "pain", "sleep"), class = "factor"), Col_Cold_3 = structure(c(1L, 3L, 2L, 4L), .Label = c("NA", "hemaloma", "muscle", "pain" ), class = "factor"), Col_Hot_1 = structure(c(4L, 3L, 2L, 1L), .Label = c("", "Callus", "NA", "infection"), class = "factor"), Col_Hot_2 = structure(c(2L, 3L, 1L, 3L), .Label = c("infection", "medication", "twitching"), class = "factor"), Col_Hot_3 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "flutter", "medication", "walking" ), class = "factor")), .Names = c("id", "Gender", "Col_Cold_1", "Col_Cold_2", "Col_Cold_3", "Col_Hot_1", "Col_Hot_2", "Col_Hot_3" ), row.names = c(NA, -4L), class = "data.frame")
推荐答案
一种方法是使数据集成为整洁"的形式,然后使用
xtabs
.首先,进行一些清理:One way is to make the dataset into a "tidy" form, then use
xtabs
. First, some cleaning up:df[] <- lapply(df, as.character) # Convert factors to characters df[df == "NA" | df == "" | is.na(df)] <- NA # Make all blanks NAs
现在,整理数据集:
library(tidyr) library(dplyr) out <- do.call(rbind, sapply(grep("^Col_Cold", names(df), value = T), function(x){ vars <- c(x, grep("^Col_Hot", names(df), value = T)) setNames(gather_(select(df, one_of(vars)), key_col = x, value_col = "value", gather_cols = vars[-1])[, c(1, 3)], c("cold", "hot")) }, simplify = FALSE))
这个想法是将每个冷"列与每个热"列配对"以创建一个长数据集.
out
看起来像这样:The idea is to "pair" each of the "cold" columns with each of the "hot" columns to make a long dataset.
out
looks like this:out # cold hot # 1 pain infection # 2 Bump <NA> # 3 <NA> Callus # 4 muscle <NA> # 5 pain medication # ...
最后,使用
xtabs
进行所需的输出:Finally, use
xtabs
to make the desired output:xtabs(~ cold + hot, na.omit(out)) # hot # cold Callus flutter infection medication twitching walking # Bump 0 1 0 0 1 0 # hemaloma 1 0 1 0 0 0 # muscle 0 1 0 1 2 0 # pain 1 0 2 2 1 1 # sleep 0 0 1 1 0 1
这篇关于r从数据帧中的列创建邻接矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!