R:igraph,社区检测,edge.betweenness方法,计数/列出每个社区的成员? [英] R: igraph, community detection, edge.betweenness method, count/list members of each community?

查看:108
本文介绍了R:igraph,社区检测,edge.betweenness方法,计数/列出每个社区的成员?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较大的图表,其中包含现实交易中的顶点":"524边":"1125".边缘被定向并具有权重(包含项是可选的). 我正在尝试调查图中的各个社区,并且本质上需要一种方法:

I've a relatively large graph with Vertices: 524 Edges: 1125, of real world transactions. The edges are directed and have a weight (inclusion is optional). I'm trying investigate the various communities within the graph and essentially need a method which:

-计算所有可能的社区

-计算最佳社区数

-返回每个(最佳)社区的成员/成员数

-Returns the members/# of members of each (optimum) community

到目前为止,我已经设法整理了以下代码,这些代码绘制了与各个社区相对应的彩色编码图,但是我不知道如何控制社区的数量(即,绘制出最高的5个社区)成员身份)或列出特定社区的成员.

So far I've managed to pull together the following code which plots a color coded graph corresponding to the various communities, however I've no idea how to control the number of communities (i.e plot the top 5 communities with the highest membership) or list the members of a particular community.

library(igraph)
edges <- read.csv('http://dl.dropbox.com/u/23776534/Facebook%20%5BEdges%5D.csv')
all<-graph.data.frame(edges)
summary(all)

all_eb <- edge.betweenness.community(all)
mods <- sapply(0:ecount(all), function(i) {
all2 <- delete.edges(all, all_eb$removed.edges[seq(length=i)])
cl <- clusters(all2)$membership
modularity(all, cl)
})


plot(mods, type="l")

all2<-delete.edges(all, all_eb$removed.edges[seq(length=which.max(mods)-1)])

V(all)$color=clusters(all2)$membership

all$layout <- layout.fruchterman.reingold(all,weight=V(all)$weigth)

plot(all, vertex.size=4, vertex.label=NA, vertex.frame.color="black", edge.color="grey",
edge.arrow.size=0.1,rescale=TRUE,vertex.label=NA, edge.width=.1,vertex.label.font=NA)


由于边缘中间性方法执行得很差,因此我再次使用Walktrap方法进行了尝试:


Because the edge betweenness method performed so poorly I tried again using the walktrap method:

all_wt<- walktrap.community(all, steps=6,modularity=TRUE,labels=TRUE)
all_wt_memb <- community.to.membership(all, all_wt$merges, steps=which.max(all_wt$modularity)-1)


colbar <- rainbow(20)
col_wt<- colbar[all_wt_memb$membership+1]

l <- layout.fruchterman.reingold(all, niter=100)
plot(all, layout=l, vertex.size=3, vertex.color=col_wt, vertex.label=NA,edge.arrow.size=0.01,
                    main="Walktrap Method")
all_wt_memb$csize
[1] 176  13 204  24   9 263  16   2   8   4  12   8   9  19  15   3   6   2   1

19个群集-更好!

19 clusters - Much better!

现在说我有一个已知簇"及其成员列表,并想检查每个观察到的簇中是否存在已知簇"中的成员.返回找到的成员的百分比.无法完成以下任务??

Now say I had a "known cluster" with a list of its members and and wanted to check each of the observed clusters for the presence of members from the "known cluster". Returning the percentage of members found. Unable to finish the following??

list<-read.csv("http://dl.dropbox.com/u/23776534/knownlist.csv")
ength(all_wt_memb$csize) #19

for(i in 1:length(all_wt_memb$csize))
{

match((V(all)[all_wt_memb$membership== i]),list)

}  

推荐答案

通过仔细查看所使用函数的文档可以发现其中两个问题.例如,值"部分中的clusters文档描述了该函数将返回的内容,其中有两个回答了您的问题.除了文档,您始终可以使用str函数来分析任何特定对象的构成.

A couple of these questions can be discovered by closely looking at the documentation of the functions you're using. For instance, the documentation of clusters, in the "Values" section, describes what will be returned from the function, a couple of which answer your questions. Documentation aside, you can always use the str function to analyze the make-up of any particular object.

也就是说,要获取特定社区中的成员或成员数量,您可以查看由clusters函数返回的membership对象(您已经在使用该对象分配颜色了).像这样:

That being said, to get the members or numbers of members in a particular community, you can look at the membership object returned by the clusters function (which you're already using to assign color). So something like:

summary(clusters(all2)$membership)

将描述正在使用的群集的ID.对于样本数据,看起来您有ID范围从0到585的群集,总共586个群集. (请注意,您将无法使用当前使用的配色方案非常准确地显示这些内容.)

would describe the IDs of the clusters that are being used. In the case of your sample data, it looks like you have clusters with the IDs ranging from 0 to 585, for 586 clusters in total. (Note that you won't be able to display those very accurately using the coloring scheme you're currently using.)

要确定每个群集中的顶点数,可以查看同样由clusters返回的csize组件.在这种情况下,它是长度为586的向量,为计算出的每个聚类存储一个大小.所以你可以使用

To determine the number of vertices in each cluster, you can look at the csize component also returned by clusters. In this case, it's a vector of length 586, storing one size for each cluster calculated. So you can use

clusters(all2)$csize

以获取群集大小的列表.请注意,如前所述,您的clusterID从0(零索引")开始,而R向量从1("one-indexed")开始,因此您需要将这些索引移位1.例如,clusters(all2)$csize[5]返回ID为4的群集的大小.

to get the list of sizes of your clusters. Be warned that your clusterIDs, as previously mentioned, start from 0 ("zero-indexed") whereas R vectors start from 1 ("one-indexed"), so you'll need to shift these indices by one. For instance, clusters(all2)$csize[5] returns the size of the cluster with the ID of 4.

要列出任何群集中的顶点,您只想查找前面提到的membership组件中的哪些ID与所讨论的群集相匹配.因此,如果我想在簇#128中找到顶点(根据clusters(all2)$csize[129],其中有21个顶点),我可以使用:

To list the vertices in any cluster, you just want to find which IDs in the membership component previously mentioned match up to the cluster in question. So if I want to find the vertices in cluster #128 (there are 21 of these, according to clusters(all2)$csize[129]), I could use:

which(clusters(all2)$membership == 128)
length(which(clusters(all2)$membership == 128)) #21

并要检索该群集中的顶点,我可以使用V函数并传入刚计算出的属于该群集成员的索引:

and to retrieve the vertices in that cluster, I can use the V function and pass in the indices which I just computed which are a member of that cluster:

> V(all2)[clusters(all2)$membership == 128]
Vertex sequence:
 [1] "625591221 - Clare Clancy"           
 [2] "100000283016052 - Podge Mooney"     
 [3] "100000036003966 - Jennifer Cleary"  
 [4] "100000248002190 - Sarah Dowd"       
 [5] "100001269231766 - LirChild Surfwear"
 [6] "100000112732723 - Stephen Howard"   
 [7] "100000136545396 - Ciaran O Hanlon"  
 [8] "1666181940 - Evion Grizewald"       
 [9] "100000079324233 - Johanna Delaney"  
[10] "100000097126561 - Órlaith Murphy"   
[11] "100000130390840 - Julieann Evans"   
[12] "100000216769732 - Steffan Ashe"     
[13] "100000245018012 - Tom Feehan"       
[14] "100000004970313 - Rob Sheahan"      
[15] "1841747558 - Laura Comber"          
[16] "1846686377 - Karen Ni Fhailliun"    
[17] "100000312579635 - Anne Rutherford"  
[18] "100000572764945 - Lit Đ Jsociety"   
[19] "100003033618584 - Fall Ball"        
[20] "100000293776067 - James O'Sullivan" 
[21] "100000104657411 - David Conway"

这将涵盖您遇到的基本图形问题.其他问题更多与图论相关.我不知道一种方法来监督要使用iGraph创建的群集的数量,但是有人可能会指出您要执行此操作的软件包.作为一个单独的问题,您可能在这里或在其他场所发布时都取得了更大的成功.

That would cover the basic igraph questions you had. The other questions are more graph-theory related. I don't know of a way to supervise the number of clusters to be created using iGraph, but someone may be able to point you to a package which is able to do that. You may have more success posting that as a separate question, either here or in another venue.

关于想要遍历所有可能的社区的第一点,我认为您会发现对于大尺寸的图来说这是不可行的. 5个不同簇的membership向量的可能排列数为5 ^ n,其中n是图的大小.如果您想找到所有可能的社区",那么如果我的心算正确的话,那么这个数字实际上就是O(n ^ n).从本质上讲,即使有大量的计算资源,也不可能在任何大小合理的网络上进行详尽的计算.因此,我认为,与clusters函数一样,使用某种智能/优化来确定图中表示的社区数会更好.

Regarding your first points of wanting to iterate through all possible communities, I think you'll find that to be unfeasible for a graph of significant size. The number of possible arrangements of the membership vector for 5 different clusters would be 5^n, where n is the size of the graph. If you want to find "all possible communities", that number will actually be O(n^n), if my mental math is correct. Essentially, it would be impossible to calculate that exhaustively over any reasonably size network, even given massive computational resources. So I think you'll be better off using some sort of intelligence/optimization for determining the number of communities represented in your graph, as the clusters function does.

这篇关于R:igraph,社区检测,edge.betweenness方法,计数/列出每个社区的成员?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆