计算顶点邻域中有多少个顶点在igraph中具有R的属性 [英] Count how many vertices in a vertex's neighbourhood have an attribute in igraph for R

查看:89
本文介绍了计算顶点邻域中有多少个顶点在igraph中具有R的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在igraph中有一个大图(实际上是几个),大约有100,000个顶点,每个顶点的属性为truefalse.对于每个顶点,我想计算有多少个直接连接到该顶点的具有该属性的顶点.我当前的解决方案是以下函数,该函数以图形为参数.

I have a large graph (several, actually) in igraph—on the order of 100,000 vertices—and each vertex has an attribute which is either true or false. For each vertex, I would like to count how many of the vertices directly connected to it have the attribute. My current solution is the following function, which takes as its argument a graph.

attrcount <- function(g) {
  nb <- neighborhood(g,order=1)
  return(sapply(nb,function(x) {sum(V(g)$attr[x]}))
}

对于具有该属性的顶点,它返回一个计数向量,该计数与1偏离,但是我可以轻松地对其进行调整.

This returns a vector of counts which is off by 1 for vertices which have the attribute, but I can adjust this easily.

问题是运行速度非常慢,并且似乎应该有一种快速的方法来进行此操作,例如,计算每个顶点的度实际上是使用degree(g)即时实现的.

The problem is that this runs incredibly slowly, and it seems like there should be a fast way to do this, since, for instance, computing the degree of each vertex is practically instantaneous with degree(g).

我这样做是不是很愚蠢?

Am I doing this a stupid way?

作为一个例子,假设这是我们的图形.

As an example, suppose this was our graph.

set.seed(42)
g <- erdos.renyi.game(169081, 178058, type="gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))

推荐答案

使用get.adjlist查询所有相邻的顶点,然后在此列表上的sapply(或tapply可能更快)以获取计数.将属性存储在向量中也是值得的,因为这样就不必一直提取它.

Use get.adjlist to query all adjacent vertices, and then sapply (or tapply might be even faster) on this list to get the counts. It is also worth storing the attribute in a vector, because then you don't need to extract it all the time.

system.time({
  al <- get.adjlist(g)
  att <- V(g)$att
  res <- sapply(al, function(x) sum(att[x]))
})
#   user  system elapsed 
#  0.571   0.005   0.576 

轻按

system.time({
  al <- get.adjlist(g)
  alv <- unlist(al)
  alf <- factor(rep(seq_along(al), sapply(al, length)),
                levels=seq_along(al))
  att <- V(g)$att
  res2 <- tapply(att[alv], alf, sum)
  res2[is.na(res2)] <- 0
})
#   user  system elapsed 
#  1.121   0.020   1.144 

all(res == res2)
# TRUE

我有些惊讶,但是tapply解决方案实际上要慢一些.

Somewhat a surprise to me, but the tapply solution is actually slower.

如果这还不够,那么我想您仍然可以通过用C/C ++编写它来使其更快.

If this is still not enough, then I guess you can still make it faster by writing it in C/C++.

这篇关于计算顶点邻域中有多少个顶点在igraph中具有R的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆