比较具有不同数量顶点的图社区 [英] compare communities from graphs with different number of vertices

查看:69
本文介绍了比较具有不同数量顶点的图社区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据通信数据图计算窗框社区,其中顶点表示大型项目中的表演者.这些图代表不同的通信方法(例如,电子邮件,电话).

我们想尝试从他们的交流数据中识别表演者团队.由于表演者偏爱不同的通信方法,因此图表的大小不同,并且可能具有某些独特的顶点,而这两种顶点可能都不存在.当我尝试比较各个图中的社区对象时,igraph :: compare()引发异常.参见下面的玩具代表.

在构造图&之前,我考虑了顶点列表的dplyr :: full_join()或inner_join().社区对象使它们具有相同的大小,但担心这样做会对最终的cluster_louvain()解决方案产生影响.

关于如何通过这些不同的交流方法将社区对象相互比较的任何想法?预先感谢!

 库(tidyverse,warn.conflicts = FALSE)图书馆(igraph,warn.conflicts = FALSE)节点<-as_tibble(list(id = c("sample1","sample2","sample3")))边<-as_tibble(list(from ="sample1",到="sample2"))净<-graph_from_data_frame(d =边,顶点=节点,有向=假)com<-cluster_louvain(net)节点2<-as_tibble(list(id = c("sample1","sample21","sample22","sample23")))edge2<-as_tibble(list(from = c("sample1","sample21"),到= c("sample21","sample22")))net2<-graph_from_data_frame(d = edge2,顶点= node2,有向= FALSE)com2<-cluster_louvain(net2)##取消注释即可查看图表#plot.igraph(net,mark.groups = com)#plot.igraph(net2,mark.groups = com2)比较(com,com2)#>i_compare(comm1,comm2,方法)中的错误:在community.c:3106:社区成员向量的长度不同,值无效 

由,进行了很好的讨论.

I am calculating louvain communities on graphs of communications data, where vertices represent performers on a big project. The graphs represent different communication methods (e.g., email, phone).

We want to try to identify teams of performers from their communication data. Since performers have preferences for different communication methods, the graphs are of different sizes and may have some unique vertices which may not be present in both. When I try to compare the community objects from the respective graphs, igraph::compare() throws an exception. See toy reprex below.

I considered a dplyr::full_join() or inner_join() of the vertex lists before constructing the graph & community objects to make them the same size, but worry about the impact of doing so on the resulting cluster_louvain() solutions.

Any ideas on how I can compare the community objects to one another from these different communication methods? Thanks in advance!

library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)

nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
                       to = "sample2"))
net <- graph_from_data_frame(d = edge, vertices = nodes, directed = FALSE)
com <- cluster_louvain(net)

nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23"
                                )))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
                       to = c("sample21", "sample22")))
net2 <- graph_from_data_frame(d = edge2, vertices = nodes2, directed = FALSE)
com2 <- cluster_louvain(net2)

# # uncomment to see graph plots
# plot.igraph(net, mark.groups = com)
# plot.igraph(net2, mark.groups = com2)

compare(com, com2)
#> Error in i_compare(comm1, comm2, method): At community.c:3106 : community membership vectors have different lengths, Invalid value

Created on 2019-02-22 by the reprex package (v0.2.1)

解决方案

You will not (I don't believe) be able to compare clustering algorithms from two different graphs that contain two different sets of nodes. Practically you can't do it in igraph and conceptually its hard because the way clustering algorithms are compared is by considering all pairs of nodes in a graph and checking whether they are placed in the same cluster or a different cluster in each of the two clustering approaches. If both clustering approaches typically put the same nodes together and the same nodes apart then they are considered more similar.1

I suppose another valid way to approach the problem would be to evaluate how similar the clustering schemes are for purely the set of nodes that are the intersection of the two graphs. You'll have to decide what makes more sense in your setting. I'll show how to do it using the union of nodes rather than the intersection.

So you need all the same nodes in both graphs in order to make the comparison. In fact, I think the easier way to do it is to put all the same nodes in one graph and have different edge types. Then you can compute your clusters for each edge type separately and then make the comparison. The reprex below is hopefully clear:

# repeat your set-up
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)

nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
                       to = "sample2"))

nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23")))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
                        to = c("sample21", "sample22")))

# approach from a single graph
# concatenate edges
edges <- rbind(edge, edge2)
# create an edge attribute indicating network type
edges$type <- c("phone", "email", "email")
# the set of nodes (across both graphs)
nodes <- unique(rbind(nodes, nodes2))

g <- graph_from_data_frame(d = edges, vertices = nodes, directed = F)

# We cluster over the graph without the email edges
com_phone <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="email"]))
plot(g, mark.groups = com_phone)

# Now we can cluster over the graph without the phone edges
com_email <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="phone"]))
plot(g, mark.groups = com_email)

# Now we can compare
compare(com_phone, com_email)
#> [1] 0.7803552

As you can see from the plots we pick out the same initial clustering structure you found in the separate graphs with the additions of the extra isolated nodes.

1: Obviously this is a pretty vague explanation. The default algorithm used in compare is from this paper, which has a nice discussion.

这篇关于比较具有不同数量顶点的图社区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆