R中的传递关系:查找值的所有链接记录 [英] Transitive relations in R: finding all linked records of a value

查看:52
本文介绍了R中的传递关系:查找值的所有链接记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个显示链接记录的数据框:

I have a data frame that shows linked records:

df <- data.frame(case = c(1,2,3,4,5,6), linked_to = c("2,4", 3,NA,NA,6,NA), stringsAsFactors = F)

# case linked_to
# 1          2,4
# 2            3
# 3         <NA>
# 4         <NA>
# 5            6
# 6         <NA>

在示例中,案例 1 链接到案例 2 4 .由于案例 2 也链接到案例 3 ,因此,案例 1 也链接到案例 2 3 4 .我想创建一个新列,以指定所有链接的案例:

In the example, case 1 is linked to cases 2 and 4. Since case 2 is also linked to case 3, it follows that case 1 is linked to cases 2,3 and 4. I want to create a new column that specifies all linked cases:

# case linked_to all_linked
# 1          2,4    1,2,3,4
# 2            3    1,2,3,4
# 3         <NA>    1,2,3,4
# 4         <NA>    1,2,3,4
# 5            6        5,6
# 6         <NA>        5,6

我可以使用igraph中的 decompose.graph 函数执行此操作,以查找孤立的组件,但是解决方案似乎有些复杂:

I can do this using the decompose.graph function in igraph to fins isolated components, but the solution seems somewhat convoluted:

library(igraph)

# Transform to igraph format    

to <- sapply(df$linked_to, function(x) unlist(strsplit(x,",")) )

from <- rep(rownames(df), sapply(to, length) )

to <- unlist(to)

from <- from[!is.na(to)]
to <- to[!is.na(to)]

d <- data.frame(from,to)

gr <- graph.data.frame(d)

# Split into components
grs <- decompose.graph(gr)

comp <- sapply(grs, function(x) V(x)$name)

matches <-  sapply(df$case, function(case) {
  sapply(comp, function(comp) {
    case %in% comp
  })
})

matches <- as.data.frame(matches)

ind <- sapply(matches, which)

# Assign all members of the component they belong to to each vertex
df$all_linked <- sapply(ind, function(x) {
  paste(comp[[x]], collapse = ",")
})

有没有更简单,更有效的解决方案?可以但不需要依赖网络分析工具.

Is there an easier and more efficient solution? It can, but doesn't need to rely on network analysis tools.

推荐答案

使用 sna 包中的 kpath.census 函数,此方法效率更高.(或者,您可以在 igraph 中使用 distances 获得相同的效果.)

This is a bit more efficient, using the kpath.census function in the sna package. (Alternative, you could use distances in igraph to the same effect.)

library(sna)
df <- data.frame(case = c(1,2,3,4,5,6), 
                 linked_to = c("2,4", 3,NA,NA,6,NA), 
                 stringsAsFactors = F)

net <- data.frame(case = c(1,1,2,3,4,5,6), 
                 linked_to = c(2, 4, 3,NA,NA,6,NA), 
                 stringsAsFactors = F)

g <- network(net[complete.cases(net),], directed = FALSE)

comemb <- kpath.census(g, maxlen = 10, mode = "digraph",  tabulate.by.vertex = TRUE, 
                       path.comembership = "sum")$path.comemb

comemb_names <- sapply(1:ncol(comemb), 
                       function(x) ifelse(comemb[x,] > 0 , 
                                          colnames(comemb)[x], 0))

comemb_names <- lapply(1:nrow(comemb_names), function(x) comemb_names[x,][comemb_names[x,] != "0"])

df$all_linked <- sapply(comemb_names, function(x) paste(x,collapse = ","))

结果:

> df
  case linked_to all_linked
1    1       2,4    1,2,3,4
2    2         3    1,2,3,4
3    3      <NA>    1,2,3,4
4    4      <NA>    1,2,3,4
5    5         6        5,6
6    6      <NA>        5,6

这篇关于R中的传递关系:查找值的所有链接记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆