修剪流浪节点的大图 [英] Pruning large graphs of stray nodes

查看:104
本文介绍了修剪流浪节点的大图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包括约35000个节点的曲线图重新psented以纯文本$ P $:

I have a graph consisting of about 35,000 nodes represented in plain text:

node1 -> node35000
node29420 -> node35000
node2334 -> node4116
...

我想通过去除不属于链上至少有三个长的部分节点修剪下来。所以,如果我只有

I'd like to trim it down by removing nodes that are not part of a chain at least three long. So if I had only

1 -> 2;
2 -> 3;
3 -> 4;
0 -> 4;

我想保持1,2,3,4(因为 1 - > 2 - > 3 - →4 是四个节点长)但放弃0,也就是删除 0 - > 4

I'd like to keep 1, 2, 3, and 4 (since 1 -> 2 -> 3 -> 4 is four nodes long) but discard 0, that is, remove 0 -> 4.

一个很好的办法做到这一点任何想法?我尝试了Perl和shell函数的组合,但我想我需要一个更好的方法。除非也许有工具已经做到这一点?这些数据是在graphviz的格式,但我没有看到相关手头的任务套件的工具。

Any idea of a good way to do this? I tried a combination of Perl and shell functions but I think I need a better approach. Unless maybe there are tools to do this already? The data is in graphviz format but I didn't see any tools in that suite relevant to the task at hand.

哦,如果有一个简单的方法做这样的事情,我打开的建议 - 它并不需要是正是我所建议的任务。我只是在寻找一种方式来去除大部分围绕大团块(这是罕见的,大多只是一些交叉链)的噪音。

Oh, and if there's an easy way to do something like this I'm open to suggestions -- it doesn't need to be exactly the task I suggested. I'm just looking for a way to remove most of the noise surrounding the big clumps (which are rare and mostly just a few intersecting chains).

推荐答案

假设任何给定的节点可以有任意多个predecessors或接班人,然后入度和出度节点无关解决问题。

Supposing that any given node can have arbitrarily many predecessors or successors, then in-degree and out-degree of nodes is irrelevant to solving the problem.

下面是一个简单O口的路径长度-3标准(N + E)算法的N个节点和E边的所有图形。该算法可以在Perl或C很容易地实施的方法,是根据一个定义,并断言:定义一个取得节点为具有一个父和子(predecessor和后继)的任何节点。将要保持的每个节点是一个节点制成或者是制成节点的父或子

Following is a simple O(N+E) algorithm for all graphs of N nodes and E edges, under the path-length-3 criterion. This algorithm can be easily implemented in Perl or C. The method is based on a definition and an assertion: Define a "made node" as any node that has a parent and child (predecessor and successor). Every node that will be kept is a made node or is a parent or child of a made node.

  1. 初始化状态数组S [Nmax时]为全零。 Nmax为最大的节点号。如果n最大不知道在一开始,读取所有数据,并找到它。

  1. Initialize a status array S[Nmax] to all zeroes. Nmax is the maximum node number. If Nmax is not known at outset, read all the data and find it out.

阅读边给定的列表中显示。每个输入项从节点p指定有向边(P,Q),以节点Q。对于每一个(P,Q),该读出在项目:设置S [P]到S [P] | 1表示是p有一个孩子,并设置S [Q]到S [Q] | 2来表示q的父。 (在此步骤之后,每制成节点n具有S [N] == 3)。

Read in the given list of edges. Each input item specifies a directed edge (p, q) from node p to node q. For each (p, q) item that is read in: Set S[p] to S[p] | 1 to denote that p has a child, and set S[q] to S[q] | 2 to denote that q has a parent. (After this step, every made node n has S[n] == 3.)

阅读边的名单了。对于每一个(P,Q),该被读入项:如果(S [P] == 3)或(S [Q] == 3)输出的边缘(P,Q)

Read the list of edges again. For each (p, q) item that is read in: If (S[p]==3) or (S[q] == 3) output edge (p,q).

要扩展这个方法到其他路径长度K比3,在内存中保存的边列表,保持SP []和Sc []与父母链和子链,并执行K / 2个额外的通行证长度。也许可以做的时间为O(N + K * E)。 该问题不会指定图形是否是DAG(有向无环图),但给出的例子是一个DAG。对于K> 3,它可以有所作为。

To extend this method to path length K other than 3, keep the edge list in memory, maintain Sp[] and Sc[] with lengths of parent chains and child chains, and perform K/2 extra passes. Might be possible to do in time O(N+K*E). The problem does not specify whether the graph is a DAG (directed acyclic graph) but the example given is a DAG. For K>3, it may make a difference.

更新1 下面是一个K> 3的算法更precise声明,与H [我] .P和H [我] .Q是边缘#i的终端和PC [J],CC [J]是的predecessor和有关节点j的后继链长度。此外,令E =#边缘; N =#节点;和K =所需的最小链长度保持优势。

Update 1 Here's a more precise statement of a K>3 algorithm, with H[i].p and H[i].q being endpoints of edge #i, and pc[j], cc[j] being lengths of predecessor and successor chains about node j. Also, let E = # of edges; N = # of nodes; and K = desired min chain length for keeping an edge.

  1. 阅读电子边缘数据输入到H []数组。将所有PC [J],CC [J]项为0。

  1. Read E edge data entries into H[ ] array. Set all pc[j], cc[j] entries to 0.

对于i = 1到E,设置CC [H [我] .P] = 1和PC [H [我] .Q] = 1。

For i = 1 to E, set cc[H[i].p]=1 and pc[H[i].q]=1.

有关j = 1至K + 1,{对于i = 1至E,{设p = H [i]于的.p和q = H [i]于.Q。集立方厘米[P] =最大值(立方厘米[P],1 +立方厘米[Q])和pc [Q] =最大值(PC [Q],1 + PC [P])。 }}

For j = 1 to K+1, { for i = 1 to E, { Let p=H[i].p and q=H[i].q. Set cc[p] = max(cc[p], 1+cc[q]) and pc[q] = max(pc[q], 1+pc[p]). } }

对于i = 1到E,{令p = H [我] .P和q = H [我] .Q。输出边(P,Q)如果PC [P] + CC [P] + 1> = K和PC [Q] + CC [Q] + 1> = K}

For i = 1 to E, { Let p=H[i].p and q=H[i].q. Output edge (p,q) if pc[p]+cc[p]+1 >= K and pc[q]+cc[q]+1 >= K.}

这方法都可能犯错误,如果图形是不是DAG和含有短循环路径。例如,如果图中边包括(1,2)和(2,1)以及没有其他节点链接到节点1或2,则既没有这些边缘应输出;但我们最终与K + 2 CC []和PC [这些节点],所以他们得到的输出呢。

This method can make mistakes if graph is not a DAG and contains short looped paths. For example, if graph edges include (1,2) and (2,1) and no other nodes link to nodes 1 or 2, then neither of those edges should be output; but we end up with K+2 for cc[] and pc[] of those nodes, so they get output anyway.

这篇关于修剪流浪节点的大图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆