确定链接在一起的链接情节的组 [英] identify groups of linked episodes which chain together

查看:98
本文介绍了确定链接在一起的链接情节的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

采用链接ID的简单数据框:

Take this simple data frame of linked ids:

test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11))

> test
  id1 id2
1  10   1
2  10  36
3   1  24
4   1  45
5  24 300
6   8  11

我现在想将链接的所有ID组合在一起. 所谓链接",是指按照链接链进行操作,以使一组中的所有ID 一起贴上标签.一种分支结构.即:

I now want to group together all the ids which link. By 'link', I mean follow through the chain of links so that all ids in one group are labelled together. A kind of branching structure. i.e:

Group 1
10 --> 1,   1 --> (24,45)
                   24 --> 300
                          300 --> NULL
                   45 --> NULL
10 --> 36, 36 --> NULL,
Final group members: 10,1,24,36,45,300

Group 2
8 --> 11
      11 --> NULL
Final group members: 8,11

现在,我大致了解了我想要的逻辑,但是不知道如何优雅地实现它.我正在考虑递归地使用match%in%遍历每个分支,但这次确实很困惑.

Now I roughly know the logic I would want, but don't know how I would implement it elegantly. I am thinking of a recursive use of match or %in% to go down each branch, but am truly stumped this time.

我要追求的最终结果是:

The final result I would be chasing is:

result <- data.frame(group=c(1,1,1,1,1,1,2,2),id=c(10,1,24,36,45,300,8,11))

> result
  group  id
1     1  10
2     1   1
3     1  24
4     1  36
5     1  45
6     1 300
7     2   8
8     2  11

推荐答案

Bioconductor软件包 RBGL (BOOST图形库的R接口)包含 connectedComp()函数,用于识别图中的已连接组件- 正是您想要的.

The Bioconductor package RBGL (an R interface to the BOOST graph library) contains a function, connectedComp(), which identifies the connected components in a graph -- just what you are wanting.

(要使用该功能,您首先需要安装 RBGL 软件包,这些软件包可在此处.)

(To use the function, you will first need to install the graph and RBGL packages, available here and here.)

library(RBGL)
test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11))

## Convert your 'from-to' data to a 'node and edge-list' representation  
## used by the 'graph' & 'RBGL' packages 
g <- ftM2graphNEL(as.matrix(test))

## Extract the connected components
cc <- connectedComp(g)

## Massage results into the format you're after 
ld <- lapply(seq_along(cc), 
             function(i) data.frame(group = names(cc)[i], id = cc[[i]]))
do.call(rbind, ld)
#   group  id
# 1     1  10
# 2     1   1
# 3     1  24
# 4     1  36
# 5     1  45
# 6     1 300
# 7     2   8
# 8     2  11

这篇关于确定链接在一起的链接情节的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆