存储/访问有向图的最佳方法 [英] Best Way to Store/Access a Directed Graph

查看:86
本文介绍了存储/访问有向图的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有大约3500个防洪设施,我想将它们表示为确定水流路径的网络(本质上是有向图)。我目前正在使用SqlServer和CTE来递归检查所有节点及其上游组件,只要上游路径不会大量分叉,该方法就可以工作。但是,由于增加了上游的复杂性,即使某些查询在物理上沿路径走得不远,也比其他查询花费的时间长得多(即,两个或三个段下游)。在某些情况下,我让它运行了十分钟,然后终止了查询。我正在使用一个简单的两列表,一列是设施本身,另一列是第一列中列出的设施的上游。



我尝试使用当前工具添加索引以帮助加快处理速度,但这没什么区别。并且,对于图中的可能连接,任何节点都可以具有多个上游连接,并且可以从多个下游节点连接到该连接。



数据中有周期,但是我还没有找到一种很好的方法来验证这一点(除了CTE查询报告最大递归计数命中率;它们很容易解决)。



所以,我的问题是,我是否错误地存储了此信息?除了CTE,还有其他更好的方法来查询上游点吗?

解决方案

我对防洪设施一无所知。但是我会选择第一个设施。并使用一个临时表和一个while循环来生成路径。



 
-伪代码
TempTable(LastNode ,CurrentNode,N)



DECLARE @intN INT
SET @intN = 1



插入TempTable(LastNode,CurrentNode,N)
-在列表中插入没有上游项的列表...将此初始条件称为
从表$ SELECT中选择LastNode,CurrentNode,@ intN
b $ b WHERE节点没有上游



@intN< = 3500
BEGIN
SEt @intN = @intN + 1
将INSERT插入TempTable(LastNode,CurrentNode,N)
从表
中选择LastNode,CurrentNode

WHERE在LastNode中(从TempTable中选择CurrentNode的N = @ intN-1) / p>

 如果@@ ROWCOUNT = 0 
BREAK

END



如果我们假设每个节点都指向一个孩子。然后,此过程不应超过3500次迭代。如果多个节点具有相同的上游提供程序,则它将花费更少的时间。但更重要的是,这使您可以做到这一点...



选择LastNode,CurrentNode,N
从TempTable
中按N



这将使您查看提供者是否存在任何循环或任何其他问题。顺便说一下,即使在每个提供程序指向最不同的上游提供程序的最坏情况下,3500行也不是那么多,这应该花不了那么长时间。


I have around 3500 flood control facilities that I would like to represent as a network to determine flow paths (essentially a directed graph). I'm currently using SqlServer and a CTE to recursively examine all the nodes and their upstream components and this works as long as the upstream path doesn't fork alot. However, some queries take exponentially longer than others even when they are not much farther physically down the path (i.e. two or three segments "downstream") because of the added upstream complexity; in some cases I've let it go over ten minutes before killing the query. I'm using a simple two-column table, one column being the facility itself and the other being the facility that is upstream from the one listed in the first column.

I tried adding an index using the current facility to help speed things up but that made no difference. And, as for the possible connections in the graph, any nodes could have multiple upstream connections and could be connected to from multiple "downstream" nodes.

It is certainly possible that there are cycles in the data but I have not yet figured out a good way to verify this (other than when the CTE query reported a maximum recursive count hit; those were easy to fix).

So, my question is, am I storing this information wrong? Is there a better way other than a CTE to query the upstream points?

解决方案

I know nothing about flood control facilities. But I would take the first facility. And use a temp table and a while loop to generate the path.

-- Pseudo Code
TempTable (LastNode, CurrentNode, N)

DECLARE @intN INT SET @intN = 1

INSERT INTO TempTable(LastNode, CurrentNode, N) -- Insert first item in list with no up stream items...call this initial condition SELECT LastNode, CurrentNode, @intN FROM your table WHERE node has nothing upstream

WHILE @intN <= 3500 BEGIN SEt @intN = @intN + 1 INSERT INTO TempTable(LastNode, CurrentNode, N) SELECT LastNode, CurrentNode, @intN FROM your table WHERE LastNode IN (SELECT CurrentNode FROM TempTable WHERE N = @intN-1)

IF @@ROWCOUNT = 0
     BREAK

END

If we assume that every node points to one child. Then this should take no longer than 3500 iterations. If multiple nodes have the same upstream provider then it will take less. But more importantly, this lets you do this...

SELECT LastNode, CurrentNode, N FROM TempTable ORDER BY N

And that will let you see if there are any loops or any other issues with your provider. Incidentally 3500 rows is not that much so even in the worst case of each provider pointing to a different upstream provider, this should not take that long.

这篇关于存储/访问有向图的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆