使用igraph在R中进行Louvain社区检测-边和顶点的格式 [英] Louvain community detection in R using igraph - format of edges and vertices

查看:857
本文介绍了使用igraph在R中进行Louvain社区检测-边和顶点的格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分数的相关矩阵,我想在igraph中使用Louvain方法在R中进行社区检测。我使用 cor2dist ,如下所示:



distancematrix<-cor2dist(correlationmatrix)



这给出了一个400 x 400的从0-2的距离矩阵。然后,使用



如果相反,您不希望连接负相关的变量,则
会摆脱上面的绝对值。此连接应该少得多

  DM2<-as.matrix(distancematrix)
##将连接归零低相关
DM2 [correlationmatrix< 0.33] = 0

G2<-graph.adjacency(DM2,mode = undirected,weighted = TRUE,diag = TRUE)
clusterlouvain<-cluster_louvain(G2)
图(G2,vertex.color = rainbow(4,alpha = 0.6)[clusterlouvain $ membership])


I have a correlation matrix of scores that I would like to run community detection on using the Louvain method in igraph, in R. I converted the correlation matrix to a distance matrix using cor2dist, as below:

distancematrix <- cor2dist(correlationmatrix)

This gives a 400 x 400 matrix of distances from 0-2. I then made the list of edges (the distances) and vertices (each of the 400 individuals) using the below method from http://kateto.net/networks-r-igraph (section 3.1).

library(igraph)
test <- as.matrix(distancematrix)
mode(test) <- "numeric"
test2 <- graph.adjacency(test, mode = "undirected", weighted = TRUE, diag = TRUE)
E(test2)$weight
get.edgelist(test2)

From this I then wrote csv files of the 'from' and 'to' edge list, and corresponding weights:

edgeweights <-E(test2)$weight
write.csv(edgeweights, file = "edgeweights.csv")
fromtolist <- get.edgelist(test2)
write.csv(fromtolist, file = "fromtolist.csv")

From these two files I produced a .csv file called "nodes.csv" which simply had all the vertex IDs for the 400 individuals:

id
1
2
3
4
...
400

And a .csv file called "edges.csv", which detailed 'from' and 'to' between each node, and provided the weight (i.e. the distance measure) for each of these edges:

from    to   weight
1       2    0.99
1       3    1.20
1       4    1.48
...
399     400  0.70

I then tried to use this node and edge list to create an igraph object, and run louvain clustering in the following way:

nodes <- read.csv("nodes.csv", header = TRUE, as.is = TRUE)
edges <- read.csv("edges.csv", header = TRUE, as.is = TRUE)
clustergraph <- graph_from_data_frame(edges, directed = FALSE, vertices = nodes)
clusterlouvain <- cluster_louvain(clustergraph)

Unfortunately this did not do the louvain community detection correctly. I expected this to return around 2-4 different communities, which could be plotted similarly to here, but sizes(clusterlouvain) returned:

Community sizes
 1 
 400

indicating that all individuals were sorted into the same community. The clustering also ran immediately (i.e. with almost no computation time), which also makes me think it was not working correctly.

My question is: Can anyone suggest why the cluster_louvain method did not work and identified just one community? I think I must be specifying the distance matrix or edges/nodes incorrectly, or in some other way not giving the correct input to the cluster_louvain method. I am relatively new to R so would be very grateful for any advice. I have successfully used other methods of community detection on the same distance matrix (i.e. k-means) which identified 2-3 communities, but would like to understand what I have done wrong here.

I'm aware there are multiple other queries about using igraph in R, but I have not found one which explicitly specifies the input format of the edges and nodes (from a correlation matrix) to get the louvain community detection working correctly.

Thank you for any advice! I can provide further information if helpful.

解决方案

I believe that cluster_louvain did exactly what it should do with your data. The problem is your graph.Your code included the line get.edgelist(test2). That must produce a lot of output. Instead try, this

vcount(test2)
ecount(test2)

Since you say that your correlation matrix is 400x400, I expect that you will get that vcount gives 400 and ecount gives 79800 = 400 * 399 / 2. As you have constructed it, every node is directly connected to all other nodes. Of course there is only one big community.

I suspect that what you are trying to do is group variables that are correlated. If the correlation is near zero, the variables should be unconnected. What seems less clear is what to do with variables with correlation near -1. Do you want them to be connected or not? We can do it either way.

You do not provide any data, so I will illustrate with the Ionosphere data from the mlbench package. I will try to mimic your code pretty closely, but will change a few variable names. Also, for my purposes, it makes no sense to write the edges to a file and then read them back again, so I will just directly use the edges that are constructed.

First, assuming that you want variables with correlation near -1 to be connected.

library(igraph)
library(mlbench)    # for Ionosphere data
library(psych)      # for cor2dist
data(Ionosphere)

correlationmatrix = cor(Ionosphere[, which(sapply(Ionosphere, class) == 'numeric')])
distancematrix <- cor2dist(correlationmatrix)

DM1 <- as.matrix(distancematrix)
## Zero out connections where there is low (absolute) correlation
## Keeps connection for cor ~ -1
## You may wish to choose a different threshhold
DM1[abs(correlationmatrix) < 0.33] = 0

G1 <- graph.adjacency(DM1, mode = "undirected", weighted = TRUE, diag = TRUE)
vcount(G1)
[1] 32
ecount(G1)
[1] 140

Not a fully connected graph! Now let's find the communities.

clusterlouvain <- cluster_louvain(G1)
plot(G1, vertex.color=rainbow(3, alpha=0.6)[clusterlouvain$membership])

If instead, you do not want variables with negative correlation to be connected, just get rid of the absolute value above. This should be much less connected

DM2 <- as.matrix(distancematrix)
## Zero out connections where there is low correlation
DM2[correlationmatrix < 0.33] = 0

G2 <- graph.adjacency(DM2, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain <- cluster_louvain(G2)
plot(G2, vertex.color=rainbow(4, alpha=0.6)[clusterlouvain$membership])

这篇关于使用igraph在R中进行Louvain社区检测-边和顶点的格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆