使用最短路径计算连接概率 [英] Use shortest path to calculate probability of connection

查看:241
本文介绍了使用最短路径计算连接概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道igraph中是否有函数计算加权图中顶点之间的连接概率,其中边的权重是相邻顶点连接的概率。



我已经基于这样的邻接矩阵建立了一个图,其中相邻的连接概率形成权重(这是对于河网,因此图的每个节点仅连接到单个下游节点)。



我曾希望在igraph中使用类似于 shortest.paths 的函数,但是将权重相加而不是计算他们和我的产品无法改变这种状况。下面的例子展示了我如何从数据I构造邻接矩阵,即顶点连接到其下游顶点(ProbConn)的概率,然后是下游顶点(下游)的身份。最下游的顶点是河口,因此它不与其他任何连接(因此称为下游的向量以NA开头)。

  (0,1,0.945881098491627,0.997349787519144,0.891475447373691,
0.993221681072185,0.48071450525165,0.0292543433507856)(b)b
$ b#向下游顶点连接的概率向量
ProbConn < - c ,0.0248645581575872,
1,0.00540807765075205,0.661465657844344,0.108524549747512,
0.383311676351655,0.708853495942148,0.00150109592270933,0.463859846404347,
0.0011491165581467,2.87879700370202e-09,0.536140153595653,
0.00831752330277812,0.00185182893416988,0.0186237313262708,
0.398961560996748,0.582414707676981,0.3338534342155656,1,00137024127706289
0.291146504057852 1,0743301054564134,0.0514743607033332
1,1

#每个节点的下游顶点
下游<-c(NA,1,2,3,4,5,6,2,2,7,5,8,4,6,10,3,11,3,4,
11, 6,6,9,9,9,8,12,5,10,13,6,6,14,15)

#创建这些向量的邻接矩阵
adjacPI< ; - 矩阵(0,nrow =长度(下游),ncol =长度(下游))#设置邻接矩阵来构建距离矩阵

(i in 1:length(downstream)) {
adjacPI [i,downstream [i]]< --ProbConn [i]#填充邻接矩阵
}

#创建反映下游连接的图形
PIgraph< - graph.adjacency(adjacPI,weighted = T)
plot(PIgraph)#可视化图形

PIpath < - 最短路径(PIgraph,mode =out )
#根据对每条路径上每一步的距离求和来创建最短路径矩阵

为了从最短路径矩阵PIpath中提取一个例子,顶点10和34通过顶点15连接。如PI路径中计算的,顶点10和顶点34之间的路径距离(PIpath [34,10])是1.708,它是连接的可能性b在顶点34和15(PIpath [34,15] = 1)和顶点15和10(PIpath [15,10] = 0.708)中,我希望它是一个产品,所以路径距离在10到34之间是1 * 0.708。



我对命名法并不完全确定,但我正在寻找的矩阵的元素将是连接顶点之间每个步骤的转换概率的乘积。本质上用产品替换最短路径中的sum函数。

是否可以使用igraph中的函数来计算,还是需要单独编写一些代码来完成如果一个路径的链接的概率 p_1,p_2,...,p_n

(假设链接成功概率是独立的,我将在整个答案中做),整个路径成功的概率为 p_1 * p_2 * ... * p_n 。当你注意到这是一个产品,但最短路径最小化总和;将产品转换为和的常用技巧是取对数。路径成功概率的记录是 log(p_1)+ log(p_2)+ ... + log(p_n)。最大化(我们的目标)相当于最小化( - log(p_1))+(-log(p_2))+ ...(-log(p_n))。由于所有概率介于0和1之间,因此它们的日志是非正面的,因此它们的日志的负面是非负面的。

总之,您可以设置所有您的权重为 -log(p_i),其中 p_i 是连接成功的概率,最短路径在一对节点之间(由 shortest.paths > igraph 中的函数计算)将成为最大化连接的可能性。通过切换到 ProbConn 下游 > graph.data.frame :

  PIgraph<  -  graph.data.frame(na。省略(cbind(从= seq_along(下游)到=下游,
weight = -log(ProbConn))),
vertices = seq_along(下游))


I'm wondering if there is a function within igraph to calculate connection probabilities among vertices in a weighted graph, where the weights for the edges are probabilities of connection of the adjacent vertices.

I've built a graph based on such an adjacency matrix where adjacent connection probabilities form the weights (this is for a river network so each node of the graph is only connected to a single downstream node).

I had hoped to use something like the shortest.paths function in igraph but that sums the weights rather than calculates the product of them and I can't work out a way to change that.

The example below shows how I construct the adjacency matrix from the data I have, which is the probability that the vertex is connected to its downstream vertex (ProbConn) and then the identity of the downstream vertex (downstream). The most downstream vertex is the river mouth so it is connected to no other (hence the vector called downstream begins with NA).

library(igraph)

# vector of probability of connectivity to downstream vertex
ProbConn <- c(0, 1, 0.945881098491627, 0.997349787519144, 0.891475447373691,
0.993221681072185, 0.48071450525165, 0.0292543433507856, 0.0248645581575872,
1, 0.00540807765075205, 0.661465657844344, 0.108524549747512,
0.383311676351655, 0.708853495942148, 0.00150109592270933, 0.463859846404347,
0.0011491165581467, 2.87879700370202e-09, 0.536140153595653,
0.00831752330277812, 0.00185182893416988, 0.0186237313262708,
0.398961560996748, 0.582414707676981, 0.338534342155656, 1, 0.00137024127706289,
0.291146504057852, 1, 0.0743301054564134, 0.0514743607033332,
1, 1)

# the downstream vertex of each node
downstream <- c(NA, 1, 2, 3, 4, 5, 6, 2, 2, 7, 5, 8, 4, 6, 10, 3, 11, 3, 4,
11, 6, 6, 9, 9, 9, 8, 12, 5, 10, 13, 6, 6, 14, 15)

# Create the adjacency matrix from these vectors
adjacPI <- matrix(0, nrow=length(downstream), ncol=length(downstream)) # Set up the adjacency matrix to build the distance matrix

for (i in 1:length(downstream)) {
  adjacPI[i, downstream[i]] <- ProbConn[i]  # Fill the adjacency matrix
}

# create the graph reflecting the downstream connectivity
PIgraph <- graph.adjacency(adjacPI, weighted=T)
plot(PIgraph) # visualise the graph 

PIpath <- shortest.paths(PIgraph, mode="out") 
# creates  the shortest paths matrix based on summing the distances of each step along each path   

To extract an example from the shortest paths matrix PIpath, vertices 10 and 34 are connected via vertex 15. As calculated in PIpath, the path distance between vertices 10 and 34 (PIpath[34,10]) is 1.708 which is the sum of the probability of connection between vertices 34 and 15 (PIpath[34,15] = 1), and vertices 15 and 10 (PIpath[15, 10] = 0.708) I would like for that to be a product so the path "distance" between 10 and 34 is 1*0.708.

I'm not entirely sure of the nomenclature but the elements of the matrix I'm looking for would be the product of the transition probabilities of each step between the connected vertices. Essentially replacing the sum function in shortest.paths with a product.

Is it possible to calculate that with a function in igraph or do I need to write some code separately to do this?

解决方案

If a path's links have probabilities p_1, p_2, ..., p_n of succeeding, then (assuming independence of the link success probabilities, which I will do throughout this answer) the probability that the entire path succeeds is p_1 * p_2 * ... * p_n. As you note this is a product, but shortest path minimizes sums; a common trick to convert products to sums is taking the logarithm. The log of the probability that the path succeeds is log(p_1) + log(p_2) + ... + log(p_n). Maximizing that (our goal) is equivalent to minimizing (-log(p_1)) + (-log(p_2)) + ... (-log(p_n)). Since all the probabilities lie between 0 and 1, their logs are non-positive and therefore the negative of their logs are non-negative.

In conclusion, you can set all your weights to -log(p_i), where p_i is the probability that the connection will succeed, and the shortest path between a pair of nodes (as calculated by the shortest.paths function in igraph) will be the path that maximizes the probability of connection. You could build your graph as a one-liner given your vectors ProbConn and downstream by switching to graph.data.frame:

PIgraph <- graph.data.frame(na.omit(cbind(from=seq_along(downstream), to=downstream,
                                          weight=-log(ProbConn))),
                            vertices=seq_along(downstream))

这篇关于使用最短路径计算连接概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆