在网络中生成不同的节点组 [英] Generating distinct groups of nodes in a network

查看:100
本文介绍了在网络中生成不同的节点组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题



鉴于以下节点和边缘网络,我想导出所有可能的节点分组,其中节点内的所有节点一个组通过一条边缘连接到该组中的所有其他节点。





在此网络中...




  • 节点 B, C和 F将在一个组中因为它们是完全互连的

  • A只会与自身属于一个组。

  • D和 B将位于一个$ b组合在一起,但是'D'不属于'B','C'和'F'的组,因为它没有通过边缘直接连接到'C'和'F'。



换句话说,规则如下:


  1. 组中的所有成员必须直接通过边缘连接到该组中的所有其他成员。


  2. 一个对象可能是多个组的成员。


  3. 无冗余组。如果一个组可以容纳更大的组,则它不是一个组。 (例如, B和 C本身并不包含有效的组,因为它们都属于 B, C和 F的较大组)。如果一个对象不属于任何其他组,则只能属于单个组(例如AA)。







我已将上面的网络表示为一个数据帧,其中每一行代表由边绑定的节点对(x1和x2):

  x1<-c( A, B, B, B, B, C, C, C, D , D, D, E, E, F, F, F)
x2<-c( A, B, C , D, F, B, C, F, B, D, E, D, E, B, C, F)

df<-data.frame(x1,x2)

鉴于此df,我想导出以下有效组(以可视以及数据框形式提供):



  1 2 3 4 
1 ABBD
2 NULL CDE
3 NULL F NULL NULL

**注意:组/组名的顺序无关紧要。






我尝试过的事情



<我已经尝试遍历df列x1中每个唯一节点名称的列表,以识别每个节点连接到的所有节点。然后,我使用此信息生成组名册。但是,这些组名册有时会因违反规则1而失效。这就是我到目前为止的...

  n< -nrow(as.data.frame(unique(df $ x1)))

RosterGuide<-as.data.frame(matrix(nrow = n,ncol = 1))
RosterGuide $ V1<-seq.int(nrow(RosterGuide))
RosterGuide $ Object<-(唯一(df $ x1))
colnames(RosterGuide)<-c( V1, 对象)
groups_frame<-matrix(,ncol = length(n),nrow = length(n))

for(1:nrow(RosterGuide)中的loopItem){

对象<-子集(RosterGuide $ Object,RosterGuide $ V1 == loopItem)
组<-as.data.frame(subset(df $ x2,df $ x1 == object ))

groups_frame<-cbind.fill(group,groups_frame,fill = NULL)
}

Groups <-as.data.frame (groups_frame)
组<-子集(组,select =-c(对象))
colnames(组)<-RosterGuide $ V1

...此循环产生数据框'Groups'...

  1 2 3 4 5 6 
1 BDBBBA
2 CEDCC NULL
3 F NULL EFD NULL
4 NULL NULL NULL NULL F NULL

这就是我的位置。您可以看到组3违反了第一条规则,因为'B'和'E'不是通过边直接连接,组5违反了第一条规则,因为'F'和'D'和'F'和'C'不是通过边缘直接连接,并且组4违反了第三条规则,因为它是组1的重复(我不太担心第三条规则的违反,我可以轻松解决这一问题)。



我不知所措,无法以类似df的任何数据帧通用的方式从数据帧 Groups获取上述建议的有效输出(2列,无限行),描述任意大小的网络的节点和边缘。

解决方案

将网络的数据帧表示形式转换为 igraph 对象。使用 max_cliques 查找无向图中的所有最大集团。

  library(igraph)
g<-graph_from_data_frame(df,directed = FALSE)
mc<-max_cliques(g,min = 1)
mc
#[[1 ]]
#+ 1/6顶点,从eb2aa45命名:
#[1] A

#[[2]]
#+ 2 /从eb2aa45命名的6个顶点:$ b​​ $ b#[1] DE

#[[3]]
#+从eb2aa45命名的2/6顶点:
#[1] DB

#[[4]]
#+ 3/6顶点,命名为eb2aa45:
#[1] BFC

获取最大集团的顶点名称。创建相应的组号并转换为数据框:

  nm <-lapply(mc,attr, names)
d<-data.frame(g = rep(seq_len(length(nm)),lengths(nm)),vert = unlist(nm))
d
#g vert
# 1 1 A
#2 2 D
#3 2 E
#4 3 D
#5 3 B
#6 4 B
#7 4 F
#8 4 C

简化绘制图形,使用 mark.groups 中的上方列表突出显示顶点组。根据口味美化(请参见?plot.igraph )。

 图(简化(g),mark.groups = nm,mark.border =红色,mark.col =不适用)


The Issue

Given the following network of nodes and edges, I would like to derive all possible groupings of nodes where all nodes within a group are connected to all other nodes within that group via an edge.

In this network...

  • nodes 'B', 'C', and 'F' would be in a group as they are fully interconnected
  • 'A' would only belong in a group with itself.
  • 'D' and 'B' would be in a group together, but 'D' would not belong in the group with 'B', 'C', and 'F' because it is not connected directly to 'C' and 'F' via an edge.

In other words, the rules are as follows:

  1. All members of a group must be connected to all other members of that group directly via an edge.

  2. An object may be a member of multiple groups.

  3. No redundant groups. If a group can fit within a larger group, it is not a group. (Ex. 'B' and 'C' do not comprise a valid group on their own because they both fit within the larger group of 'B', 'C', and 'F'). An object may only be in a singular group (ex. A-A) if it belongs to no other groups.


I have represented the network above as a dataframe where each row represent pairs of nodes (x1 and x2) bound by an edge:

x1 <- c("A", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "F", "F", "F")
x2 <- c("A", "B", "C", "D", "F", "B", "C", "F", "B", "D", "E", "D", "E", "B", "C", "F")

df <- data.frame(x1, x2)

Given this df, I would like to derive the following valid groups (provided in visual as well as data frame form):

     1    2    3    4   
1    A    B    B    D       
2   NULL  C    D    E 
3   NULL  F   NULL NULL 

**Note: the order of groups/group names is irrelevant.


What I have tried

I have attempted to loop through a list of each unique node name in column x1 of df to identify all nodes that each node is connected to. I then use this information to generate group rosters. However, these group rosters are sometimes invalidated by violating rule 1. Here is what I have thus far...

n <- nrow(as.data.frame(unique(df$x1)))

RosterGuide <- as.data.frame(matrix(nrow = n , ncol = 1)) 
RosterGuide$V1 <- seq.int(nrow(RosterGuide))
RosterGuide$Object <- (unique(df$x1))
colnames(RosterGuide) <- c("V1","Object")
groups_frame <- matrix(, ncol= length(n), nrow = length(n))

for (loopItem in 1:nrow(RosterGuide)) {

object <- subset(RosterGuide$Object, RosterGuide$V1 == loopItem)
group <- as.data.frame(subset(df$x2, df$x1 == object))

groups_frame <- cbind.fill(group, groups_frame, fill = "NULL")
}

Groups <- as.data.frame(groups_frame)
Groups <- subset(Groups, select = - c(object))
colnames(Groups) <- RosterGuide$V1

... this loop yields the data frame 'Groups'...

     1    2    3    4   5    6
1    B    D    B    B   B    A
2    C    E    D    C   C NULL
3    F NULL    E    F   D NULL
4 NULL NULL NULL NULL   F NULL

This is where I am at. You can see that group 3 violates the first rule because 'B' and 'E' are not directly connected by an edge, group 5 violates the first rule because 'F' and 'D' and 'F' and 'C' are not directly connected via an edge, and group 4 violates the third rule because it is a duplication of group 1 (I am less worried about 3rd rule violations, I can solve that one easily).

I am at a loss in trying to get from the data frame 'Groups' to the valid output I suggested above in a way that is universal to any data frame like df (2 columns, infinite rows) that describes the nodes and edges of a network of any size.

解决方案

Convert your data frame representation of the network to an igraph object. Use max_cliques to find "all the maximal cliques in an undirected graph".

library(igraph)
g <- graph_from_data_frame(df, directed = FALSE)
mc <- max_cliques(g, min = 1)
mc
# [[1]]
# + 1/6 vertex, named, from eb2aa45:
# [1] A
# 
# [[2]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D E
# 
# [[3]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D B
# 
# [[4]]
# + 3/6 vertices, named, from eb2aa45:
# [1] B F C

Grab the names of the vertices of the maximal cliques. Create corresponding group numbers and convert to data frame:

nm <- lapply(mc, attr, "names")
d <- data.frame(g = rep(seq_len(length(nm)), lengths(nm)), vert = unlist(nm))
d
#   g vert
# 1 1    A
# 2 2    D
# 3 2    E
# 4 3    D
# 5 3    B
# 6 4    B
# 7 4    F
# 8 4    C

simplify graph, plot it, highlight vertex groups using the list above in mark.groups. Prettify according to taste (see ?plot.igraph).

plot(simplify(g), mark.groups = nm, mark.border = "red", mark.col = NA)

这篇关于在网络中生成不同的节点组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆