如何在R中同时进行三个领域的网络分析 [英] How to do network analysis on three fields simultaneously in R

查看:13
本文介绍了如何在R中同时进行三个领域的网络分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在R中同时对三个字段进行网络分析。

以下是示例数据以及最后一栏中的desired output

df <- data.frame(
  stringsAsFactors = FALSE,
                    id_1 = c("ABC","ABC","BCD",
                             "CDE","DEF","EFG","GHI","HIJ","IJK","JKL",
                             "GHI","KLM","LMN","MNO","NOP"),
                    id_2 = c("1A","2A","3A",
                             "1A","4A","5A","6A","8A","9A","10A","7A",
                             "12A","13A","14A","15A"),
                    id_3 = c("Z3","Z2","Z1",
                             "Z4","Z1","Z5","Z5","Z6","Z7","Z8","Z6","Z8",
                             "Z9","Z9","Z1"),
                    Name = c("StackOverflow1",
                             "StackOverflow2","StackOverflow3","StackOverflow4",
                             "StackOverflow5","StackOverflow6",
                             "StackOverflow7","StackOverflow8","StackOverflow9",
                             "StackOverflow10","StackOverflow11","StackOverflow12",
                             "StackOverflow13","StackOverflow14","StackOverflow15"),
          desired_output = c(1L,1L,2L,1L,2L,
                             3L,3L,3L,4L,5L,3L,5L,6L,6L,2L)
      )
df
#>    id_1 id_2 id_3            Name desired_output
#> 1   ABC   1A   Z3  StackOverflow1              1
#> 2   ABC   2A   Z2  StackOverflow2              1
#> 3   BCD   3A   Z1  StackOverflow3              2
#> 4   CDE   1A   Z4  StackOverflow4              1
#> 5   DEF   4A   Z1  StackOverflow5              2
#> 6   EFG   5A   Z5  StackOverflow6              3
#> 7   GHI   6A   Z5  StackOverflow7              3
#> 8   HIJ   8A   Z6  StackOverflow8              3
#> 9   IJK   9A   Z7  StackOverflow9              4
#> 10  JKL  10A   Z8 StackOverflow10              5
#> 11  GHI   7A   Z6 StackOverflow11              3
#> 12  KLM  12A   Z8 StackOverflow12              5
#> 13  LMN  13A   Z9 StackOverflow13              6
#> 14  MNO  14A   Z9 StackOverflow14              6
#> 15  NOP  15A   Z1 StackOverflow15              2

实际上我可以使用igraph同时对两个字段执行网络分析,如我自己的答案here,但我不能对两个字段执行网络分析。

请帮帮忙。

我感觉我目前的方法(2次迭代)可以优化。

library(igraph)
library(tidyverse)

graph.data.frame(df) %>%
  components() %>%
  pluck(membership) %>%
  stack() %>%
  set_names(c('GRP', 'id_1')) %>%
  right_join(df %>% mutate(id_1 = as.factor(id_1)), by = c('id_1')) %>%
  select(GRP, id_3) %>%
  graph.data.frame() %>% 
  components() %>%
  pluck(membership) %>%
  stack() %>%
  set_names(c('GRP', 'id_3')) %>%
  right_join(df %>% mutate(id_3 = as.factor(id_3)), by = c('id_3'))
#>    GRP id_3 id_1 id_2            Name desired_output
#> 1    1   Z3  ABC   1A  StackOverflow1              1
#> 2    1   Z2  ABC   2A  StackOverflow2              1
#> 3    2   Z1  BCD   3A  StackOverflow3              2
#> 4    2   Z1  DEF   4A  StackOverflow5              2
#> 5    2   Z1  NOP  15A StackOverflow15              2
#> 6    1   Z4  CDE   1A  StackOverflow4              1
#> 7    3   Z5  EFG   5A  StackOverflow6              3
#> 8    3   Z5  GHI   6A  StackOverflow7              3
#> 9    3   Z6  HIJ   8A  StackOverflow8              3
#> 10   3   Z6  GHI   7A StackOverflow11              3
#> 11   4   Z7  IJK   9A  StackOverflow9              4
#> 12   5   Z8  JKL  10A StackOverflow10              5
#> 13   5   Z8  KLM  12A StackOverflow12              5
#> 14   6   Z9  LMN  13A StackOverflow13              6
#> 15   6   Z9  MNO  14A StackOverflow14              6

reprex package(v2.0.1)于2021-11-15创建

推荐答案

创建由id列和行号定义的顶点之间的所有连接的列表(函数f)。最后,您只对行之间的连接感兴趣。

f <- function(vec){
  
  i <- last(vec)
  vec <- head(vec, -1)
  
  c(
    seq_len(length(vec) - 1) %>% map(~vec[.x:(.x+1)]),
    vec %>% map(~c(i, .x))
  ) 
}

df$desired_output <- df %>%
  select(matches("^id_[0-9]+$")) %>%
  mutate(row = row_number()) %>%
  pmap(~f(c(...))) %>%
  flatten() %>%
  reduce(rbind) %>%
  igraph::graph_from_edgelist() %>% 
  components() %>%
  membership() %>%
  .[as.character(seq_len(nrow(df)))]

编辑

想象一下ID之间的连接。您对行之间的连接感兴趣。为此,您需要为每行添加顶点。这些顶点连接到该行中的所有ID。

第6行示例:

6  EFG   5A   Z5

我们对ID之间的连接感兴趣(c中的第一部分f中:

[[1]]
[1] "EFG" "5A" 

[[2]]
[1] "5A" "Z5"

和行与ID之间的连接(fc的第二部分):

[[1]]
[1] "6"   "EFG"

[[2]]
[1] "6"  "5A"

[[3]]
[1] "6"  "Z5"

当您以这种方式创建图表时,结果是:

并且您感兴趣的是连接了哪些行顶点

备注

您可以在为此结果创建图表时使用directed = FALSE,如果您对此感兴趣,可以使用mode = "strong"中的mode = "strong"

这篇关于如何在R中同时进行三个领域的网络分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆