r从数据帧中的列创建邻接矩阵 [英] r creating an adjacency matrix from columns in a dataframe

查看:132
本文介绍了r从数据帧中的列创建邻接矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对测试某些网络可视化技术感兴趣,但是在尝试这些功能之前,我想使用如下数据框构建邻接矩阵(从,到).

I am interested in testing some network visualization techniques but before trying those functions I want to build an adjacency matrix (from, to) using the dataframe which is as follows.

 Id   Gender   Col_Cold_1  Col_Cold_2  Col_Cold_3  Col_Hot_1  Col_Hot_2   Col_Hot_3  
 10   F         pain       sleep        NA         infection  medication  walking
 14   F         Bump       NA           muscle     NA         twitching   flutter
 17   M                    pain         hemoloma   Callus     infection   
 18   F         muscle                  pain                  twitching   medication

我的目标是创建如下的邻接矩阵

My goal is to create an adjacency matrix as follows

1) All values in columns with keyword Cold will contribute to the rows  
2) All values in columns with keyword Hot will contribute to the columns

例如,pain, sleep, Bump, muscle, hemaloma是带有关键字的列下的单元格值,它们将构成行,而infection, medication, Callus, walking, twitching, flutter这样的单元格值将位于带有关键字 Hot 的列下strong>,这将形成关联矩阵的列.

For example, pain, sleep, Bump, muscle, hemaloma are cell values under the columns with keyword Cold and they will form the rows and cell values such as infection, medication, Callus, walking, twitching, flutter are under columns with keywords Hot and this will form the columns of the association matrix.

最终所需的输出应如下所示:

The final desired output should appear like this:

           infection  medication  walking  twitching  flutter  Callus
     pain  2          2           1        1                   1
    sleep  1          1           1
     Bump                                  1          1
   muscle             1                    1
 hemaloma  1                                                   1

  • [pain, infection] = 2是因为疼痛和感染之间的关联在原始数据帧中发生了两次:在第1行一次,在第3行一次.

    • [pain, infection] = 2 because the association between pain and infection occurs twice in the original dataframe: once in row 1 and again in row 3.

      [pain, medication] = 2是因为疼痛和药物之间的关联在第1行中出现了两次,在第4行中又发生了一次.

      [pain, medication]=2 because association between pain and medication occurs twice once in row 1 and again in row 4.

      非常感谢您提供有关产生这种关联矩阵的任何建议或建议.

      Any suggestions or advice on producing such an association matrix is much appreciated thanks.

      可复制数据集

      df = structure(list(id = c(10, 14, 17, 18), Gender = structure(c(1L, 1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), Col_Cold_1 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "Bump", "muscle", "pain"), class = "factor"), Col_Cold_2 = structure(c(4L, 2L, 3L, 1L), .Label = c("", "NA", "pain", "sleep"), class = "factor"), Col_Cold_3 = structure(c(1L, 3L, 2L, 4L), .Label = c("NA", "hemaloma", "muscle", "pain" ), class = "factor"), Col_Hot_1 = structure(c(4L, 3L, 2L, 1L), .Label = c("", "Callus", "NA", "infection"), class = "factor"), Col_Hot_2 = structure(c(2L, 3L, 1L, 3L), .Label = c("infection", "medication", "twitching"), class = "factor"), Col_Hot_3 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "flutter", "medication", "walking" ), class = "factor")), .Names = c("id", "Gender", "Col_Cold_1", "Col_Cold_2", "Col_Cold_3", "Col_Hot_1", "Col_Hot_2", "Col_Hot_3" ), row.names = c(NA, -4L), class = "data.frame")
      

      推荐答案

      一种方法是使数据集成为整洁"的形式,然后使用xtabs.首先,进行一些清理:

      One way is to make the dataset into a "tidy" form, then use xtabs. First, some cleaning up:

      df[] <- lapply(df, as.character)  # Convert factors to characters
      df[df == "NA" | df == "" | is.na(df)] <- NA  # Make all blanks NAs
      

      现在,整理数据集:

      library(tidyr)
      library(dplyr)
      out <- do.call(rbind, sapply(grep("^Col_Cold", names(df), value = T), function(x){
        vars <- c(x, grep("^Col_Hot", names(df), value = T))
        setNames(gather_(select(df, one_of(vars)), 
          key_col = x,
          value_col = "value",
          gather_cols = vars[-1])[, c(1, 3)], c("cold", "hot"))
      }, simplify = FALSE))
      

      这个想法是将每个冷"列与每个热"列配对"以创建一个长数据集. out看起来像这样:

      The idea is to "pair" each of the "cold" columns with each of the "hot" columns to make a long dataset. out looks like this:

      out
      #        cold        hot
      # 1      pain  infection
      # 2      Bump       <NA>
      # 3      <NA>     Callus
      # 4    muscle       <NA>
      # 5      pain medication
      # ...
      

      最后,使用xtabs进行所需的输出:

      Finally, use xtabs to make the desired output:

      xtabs(~ cold + hot, na.omit(out))
      #           hot
      # cold       Callus flutter infection medication twitching walking
      #   Bump          0       1         0          0         1       0
      #   hemaloma      1       0         1          0         0       0
      #   muscle        0       1         0          1         2       0
      #   pain          1       0         2          2         1       1
      #   sleep         0       0         1          1         0       1
      

      这篇关于r从数据帧中的列创建邻接矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆