所有线对最短路径问题的最快实现方式? [英] Fastest implementation for All-pairs shortest paths problem?

查看:119
本文介绍了所有线对最短路径问题的最快实现方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个加权图,有30k个节点,160k个边,没有负权重。
我想计算从所有节点到其他节点的所有最短路径。
我想我不能假设任何特殊的启发式方法可以简化问题。

I have a weighted graph 30k nodes 160k edges, no negative weights. I would like to compute all the shortest paths from all the nodes to the others. I think I cannot assume any particular heuristics to simplify the problem.

我尝试使用Dijkstra C实现 http://compprog.wordpress.com/2007/12/01/one-source- shortest-path-dijkstras-algorithm / ,即单个最短路径问题,为我的所有30个节点调用函数dijkstras()。您可以想象,它需要很长时间。目前,我没有时间自己编写和调试代码,我必须尽快计算出这些路径并将其存储在数据库中,因此我正在寻找另一种可供使用的更快解决方案,

I tried to use this Dijkstra C implementation http://compprog.wordpress.com/2007/12/01/one-source-shortest-path-dijkstras-algorithm/, that is for a single shortest path problem, calling the function dijkstras() for all my 30 nodes. As you can imagine, it takes ages. At the moment I don't have the time to write and debug the code by myself, I have to compute this paths as soon as possible and store them in a database so I am looking for another faster solution ready to use, do you have any tips?

我必须在具有8GB内存的最新Macbook pro上运行它,我想找到一个不超过24小时即可完成计算的解决方案

I have to run it on a recent macbook pro with 8GB ram and I would like to find a solution that takes not more than 24 hours to finish the computation.

多谢了!

Eugenio

推荐答案

我查看了您在评论中发布的Dijkstra的算法链接,我相信这是效率低下的根源。在内部Dijkstra循环中,它使用一种极其未经优化的方法来确定下一个要探索的节点(对每个步骤的每个节点进行线性扫描)。有问题的代码有两个地方。第一个是此代码,它试图找到下一个要操作的节点:

I looked over the Dijkstra's algorithm link that you posted in the comments and I believe that it's the source of your inefficiency. Inside the inner Dijkstra's loop, it's using an extremely unoptimized approach to determine which node to explore next (a linear scan over every node at each step). The problematic code is in two spots. The first is this code, which tries to find the next node to operate on:

mini = -1;
for (i = 1; i <= n; ++i)
    if (!visited[i] && ((mini == -1) || (d[i] < d[mini])))
         mini = i;

由于此代码嵌套在访问每个节点的循环中,因此复杂性(如链接)是O(| V | 2 ),其中| V |是节点数。在您的情况下,由于| V |是30,000,则此循环总共将进行9亿次迭代。这真是令人痛苦的缓慢(如您所见),但是没有必要做这么多工作。

Because this code is nested inside of a loop that visits every node, the complexity (as mentioned in the link) is O(|V|2), where |V| is the number of nodes. In your case, since |V| is 30,000, there will be nine hundred million iterations of this loop overall. This is painfully slow (as you've seen), but there's no reason to have to do this much work.

另一个麻烦点就在这里,它试图找出哪个图中的边缘应用于减少其他节点的成本:

Another trouble spot is here, which tries to find which edge in the graph should be used to reduce the cost of other nodes:

for (i = 1; i <= n; ++i)
   if (dist[mini][i])
       if (d[mini] + dist[mini][i] < d[i])
           d[i] = d[mini] + dist[mini][i];

这将扫描邻接矩阵的整行以寻找要考虑的节点,这需要时间O( n)不管离开节点的输出边缘有多少。

This scans over an entire row in the adjacency matrix looking for nodes to consider, which takes time O(n) irrespective of how many outgoing edges leave the node.

虽然您可以尝试将此版本的Dijkstra修复为更优化的实现,但我认为这里的正确选择是只是丢掉这段代码,并找到Dijkstra算法的更好实现。例如,如果您使用 Wikipedia文章中的伪代码,该伪代码通过二进制堆,您可以获取Dijkstra算法在O(| E | log | V |)中运行。在您的情况下,该值刚好超过200万,比您当前的方法快450倍。这是一个巨大的差异,我敢打赌,如果使用更好的Dijkstra实现,您最终将在比以前更短的时间内完成代码。

While you could try fixing up this version of Dijkstra's into a more optimized implementation, I think the correct option here is just to throw this code away and find a better implementation of Dijkstra's algorithm. For example, if you use the pseudocode from the Wikipedia article implemented with a binary heap, you can get Dijkstra's algorithm running in O(|E| log |V|). In your case, this value is just over two million, which is about 450 times faster than your current approach. That's a huge difference, and I'm willing to bet that with a better Dijkstra's implementation you'll end up getting the code completing in a substantially shorter time than before.

除此之外,正如Jacob Eggers指出的那样,您可能要考虑并行运行所有Dijkstra搜索。该凸轮可为您拥有的每个处理器提供额外的速度提升。结合以上(和更关键的)修复程序,这可能会为您带来巨大的性能提升。

On top of this, you might want to consider running all the Dijkstra searches in parallel, as Jacob Eggers has pointed out. This cam get you an extra speed boost for each processor that you have. Combined with the above (and more critical) fix, this should probably give you a huge performance increase.

如果您计划在密度更高的数据集上运行此算法,边数接近| V | 2 / log | V |)的情况,那么您可能要考虑切换到 Floyd-Warshall 算法。每个节点运行一次Dijkstra算法(有时称为 Johnson算法)需要时间O(| V || E | log | V |)时间,而使用Floyd-Warshall则需要O(| V | 3 )时间。但是,对于您提到的数据集,该图非常稀疏,因此运行多个Dijkstra实例应该可以。

If you plan on running this algorithm on a much denser data set (one where the number of edges approaches |V|2 / log |V|), then you may want to consider switching to the Floyd-Warshall algorithm. Running Dijkstra's algorithm once per node (sometimes called Johnson's algorithm) takes time O(|V||E| log |V|) time, while using Floyd-Warshall takes O(|V|3) time. However, for the data set you've mentioned, the graph is sufficiently sparse that running multiple Dijkstra's instances should be fine.

希望这会有所帮助!

这篇关于所有线对最短路径问题的最快实现方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆