修剪流浪节点的大图 [英] Pruning large graphs of stray nodes

查看：104 发布时间：2015/11/30 21:00:12 algorithm language-agnostic graph-theory graphviz

本文介绍了修剪流浪节点的大图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有包括约35000个节点的曲线图重新psented以纯文本$ P $：

I have a graph consisting of about 35,000 nodes represented in plain text:

node1 -> node35000
node29420 -> node35000
node2334 -> node4116
...

我想通过去除不属于链上至少有三个长的部分节点修剪下来。所以，如果我只有

I'd like to trim it down by removing nodes that are not part of a chain at least three long. So if I had only

1 -> 2;
2 -> 3;
3 -> 4;
0 -> 4;

我想保持1，2，3，4（因为 1 - ＆GT; 2 - ＆GT; 3 - →4 是四个节点长）但放弃0，也就是删除 0 - ＆GT; 4 。

I'd like to keep 1, 2, 3, and 4 (since 1 -> 2 -> 3 -> 4 is four nodes long) but discard 0, that is, remove 0 -> 4.

一个很好的办法做到这一点任何想法？我尝试了Perl和shell函数的组合，但我想我需要一个更好的方法。除非也许有工具已经做到这一点？这些数据是在graphviz的格式，但我没有看到相关手头的任务套件的工具。

Any idea of a good way to do this? I tried a combination of Perl and shell functions but I think I need a better approach. Unless maybe there are tools to do this already? The data is in graphviz format but I didn't see any tools in that suite relevant to the task at hand.

哦，如果有一个简单的方法做这样的事情，我打开的建议 - 它并不需要是正是我所建议的任务。我只是在寻找一种方式来去除大部分围绕大团块（这是罕见的，大多只是一些交叉链）的噪音。

Oh, and if there's an easy way to do something like this I'm open to suggestions -- it doesn't need to be exactly the task I suggested. I'm just looking for a way to remove most of the noise surrounding the big clumps (which are rare and mostly just a few intersecting chains).

推荐答案

假设任何给定的节点可以有任意多个predecessors或接班人，然后入度和出度节点无关解决问题。

Supposing that any given node can have arbitrarily many predecessors or successors, then in-degree and out-degree of nodes is irrelevant to solving the problem.

下面是一个简单O口的路径长度-3标准（N + E）算法的N个节点和E边的所有图形。该算法可以在Perl或C很容易地实施的方法，是根据一个定义，并断言：定义一个取得节点为具有一个父和子（predecessor和后继）的任何节点。将要保持的每个节点是一个节点制成或者是制成节点的父或子

Following is a simple O(N+E) algorithm for all graphs of N nodes and E edges, under the path-length-3 criterion. This algorithm can be easily implemented in Perl or C. The method is based on a definition and an assertion: Define a "made node" as any node that has a parent and child (predecessor and successor). Every node that will be kept is a made node or is a parent or child of a made node.

初始化状态数组S [Nmax时]为全零。 Nmax为最大的节点号。如果n最大不知道在一开始，读取所有数据，并找到它。

Initialize a status array S[Nmax] to all zeroes. Nmax is the maximum node number. If Nmax is not known at outset, read all the data and find it out.

阅读边给定的列表中显示。每个输入项从节点p指定有向边（P，Q），以节点Q。对于每一个（P，Q），该读出在项目：设置S [P]到S [P] | 1表示是p有一个孩子，并设置S [Q]到S [Q] | 2来表示q的父。（在此步骤之后，每制成节点n具有S [N] == 3）。

Read in the given list of edges. Each input item specifies a directed edge (p, q) from node p to node q. For each (p, q) item that is read in: Set S[p] to S[p] | 1 to denote that p has a child, and set S[q] to S[q] | 2 to denote that q has a parent. (After this step, every made node n has S[n] == 3.)

阅读边的名单了。对于每一个（P，Q），该被读入项：如果（S [P] == 3）或（S [Q] == 3）输出的边缘（P，Q）

Read the list of edges again. For each (p, q) item that is read in: If (S[p]==3) or (S[q] == 3) output edge (p,q).

要扩展这个方法到其他路径长度K比3，在内存中保存的边列表，保持SP []和Sc []与父母链和子链，并执行K / 2个额外的通行证长度。也许可以做的时间为O（N + K * E）。该问题不会指定图形是否是DAG（有向无环图），但给出的例子是一个DAG。对于K> 3，它可以有所作为。

To extend this method to path length K other than 3, keep the edge list in memory, maintain Sp[] and Sc[] with lengths of parent chains and child chains, and perform K/2 extra passes. Might be possible to do in time O(N+K*E). The problem does not specify whether the graph is a DAG (directed acyclic graph) but the example given is a DAG. For K>3, it may make a difference.

更新1 下面是一个K> 3的算法更precise声明，与H [我] .P和H [我] .Q是边缘#i的终端和PC [J]，CC [J]是的predecessor和有关节点j的后继链长度。此外，令E =＃边缘; N =＃节点;和K =所需的最小链长度保持优势。

Update 1 Here's a more precise statement of a K>3 algorithm, with H[i].p and H[i].q being endpoints of edge #i, and pc[j], cc[j] being lengths of predecessor and successor chains about node j. Also, let E = # of edges; N = # of nodes; and K = desired min chain length for keeping an edge.

阅读电子边缘数据输入到H []数组。将所有PC [J]，CC [J]项为0。

Read E edge data entries into H[ ] array. Set all pc[j], cc[j] entries to 0.

对于i = 1到E，设置CC [H [我] .P] = 1和PC [H [我] .Q] = 1。

For i = 1 to E, set cc[H[i].p]=1 and pc[H[i].q]=1.

有关j = 1至K + 1，{对于i = 1至E，{设p = H [i]于的.p和q = H [i]于.Q。集立方厘米[P] =最大值（立方厘米[P]，1 +立方厘米[Q]）和pc [Q] =最大值（PC [Q]，1 + PC [P]）。 }}

For j = 1 to K+1, { for i = 1 to E, { Let p=H[i].p and q=H[i].q. Set cc[p] = max(cc[p], 1+cc[q]) and pc[q] = max(pc[q], 1+pc[p]). } }

对于i = 1到E，{令p = H [我] .P和q = H [我] .Q。输出边（P，Q）如果PC [P] + CC [P] + 1> = K和PC [Q] + CC [Q] + 1> = K}

For i = 1 to E, { Let p=H[i].p and q=H[i].q. Output edge (p,q) if pc[p]+cc[p]+1 >= K and pc[q]+cc[q]+1 >= K.}

这方法都可能犯错误，如果图形是不是DAG和含有短循环路径。例如，如果图中边包括（1,2）和（2,1）以及没有其他节点链接到节点1或2，则既没有这些边缘应输出;但我们最终与K + 2 CC []和PC [这些节点]，所以他们得到的输出呢。

This method can make mistakes if graph is not a DAG and contains short looped paths. For example, if graph edges include (1,2) and (2,1) and no other nodes link to nodes 1 or 2, then neither of those edges should be output; but we end up with K+2 for cc[] and pc[] of those nodes, so they get output anyway.

这篇关于修剪流浪节点的大图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

修剪流浪节点的大图 [英] Pruning large graphs of stray nodes

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

修剪流浪节点的大图 [英] Pruning large graphs of stray nodes

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭