R中的传递关系:查找值的所有链接记录 [英] Transitive relations in R: finding all linked records of a value
问题描述
我有一个显示链接记录的数据框:
I have a data frame that shows linked records:
df <- data.frame(case = c(1,2,3,4,5,6), linked_to = c("2,4", 3,NA,NA,6,NA), stringsAsFactors = F)
# case linked_to
# 1 2,4
# 2 3
# 3 <NA>
# 4 <NA>
# 5 6
# 6 <NA>
在示例中,案例 1
链接到案例 2
和 4
.由于案例 2
也链接到案例 3
,因此,案例 1
也链接到案例 2
, 3
和 4
.我想创建一个新列,以指定所有链接的案例:
In the example, case 1
is linked to cases 2
and 4
. Since case 2
is also linked to case 3
, it follows that case 1
is linked to cases 2
,3
and 4
. I want to create a new column that specifies all linked cases:
# case linked_to all_linked
# 1 2,4 1,2,3,4
# 2 3 1,2,3,4
# 3 <NA> 1,2,3,4
# 4 <NA> 1,2,3,4
# 5 6 5,6
# 6 <NA> 5,6
我可以使用igraph中的 decompose.graph
函数执行此操作,以查找孤立的组件,但是解决方案似乎有些复杂:
I can do this using the decompose.graph
function in igraph to fins isolated components, but the solution seems somewhat convoluted:
library(igraph)
# Transform to igraph format
to <- sapply(df$linked_to, function(x) unlist(strsplit(x,",")) )
from <- rep(rownames(df), sapply(to, length) )
to <- unlist(to)
from <- from[!is.na(to)]
to <- to[!is.na(to)]
d <- data.frame(from,to)
gr <- graph.data.frame(d)
# Split into components
grs <- decompose.graph(gr)
comp <- sapply(grs, function(x) V(x)$name)
matches <- sapply(df$case, function(case) {
sapply(comp, function(comp) {
case %in% comp
})
})
matches <- as.data.frame(matches)
ind <- sapply(matches, which)
# Assign all members of the component they belong to to each vertex
df$all_linked <- sapply(ind, function(x) {
paste(comp[[x]], collapse = ",")
})
有没有更简单,更有效的解决方案?可以但不需要依赖网络分析工具.
Is there an easier and more efficient solution? It can, but doesn't need to rely on network analysis tools.
推荐答案
使用 sna
包中的 kpath.census
函数,此方法效率更高.(或者,您可以在 igraph
中使用 distances
获得相同的效果.)
This is a bit more efficient, using the kpath.census
function in the sna
package. (Alternative, you could use distances
in igraph
to the same effect.)
library(sna)
df <- data.frame(case = c(1,2,3,4,5,6),
linked_to = c("2,4", 3,NA,NA,6,NA),
stringsAsFactors = F)
net <- data.frame(case = c(1,1,2,3,4,5,6),
linked_to = c(2, 4, 3,NA,NA,6,NA),
stringsAsFactors = F)
g <- network(net[complete.cases(net),], directed = FALSE)
comemb <- kpath.census(g, maxlen = 10, mode = "digraph", tabulate.by.vertex = TRUE,
path.comembership = "sum")$path.comemb
comemb_names <- sapply(1:ncol(comemb),
function(x) ifelse(comemb[x,] > 0 ,
colnames(comemb)[x], 0))
comemb_names <- lapply(1:nrow(comemb_names), function(x) comemb_names[x,][comemb_names[x,] != "0"])
df$all_linked <- sapply(comemb_names, function(x) paste(x,collapse = ","))
结果:
> df
case linked_to all_linked
1 1 2,4 1,2,3,4
2 2 3 1,2,3,4
3 3 <NA> 1,2,3,4
4 4 <NA> 1,2,3,4
5 5 6 5,6
6 6 <NA> 5,6
这篇关于R中的传递关系:查找值的所有链接记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!