ArangoDB:获取每个节点,该节点与所选节点有任何关系 [英] ArangoDB: Get every node, which is in any way related to a selected node

查看:482
本文介绍了ArangoDB:获取每个节点,该节点与所选节点有任何关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ArangoDB中有一个简单的节点链接图.如何从1个预选节点中遍历并返回与其相关的所有节点?

I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?

例如: A→B,B→C,C→D,C→E,F→B,F→E

For example: A→B, B→C, C→D, C→E, F→B, F→E

选择任何一个都应返回相同的结果(所有).

Selecting any of them should return the same result (all of them).

我对ArangoDB非常陌生.

I am very new to ArangoDB.

推荐答案

您需要的是

What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.

AQL遍历可让您跟随连接到起始顶点的边,直至可变深度.可以访问每个遇到的顶点,例如进行过滤或构造结果,以及导致您到达该顶点的边以及从头到尾的完整路径,包括顶点和边.

AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.

在您的情况下,仅需要返回已访问顶点的名称.您可以运行以下AQL查询,假设有一个文档集合node和一个边缘集合links,并且其中包含该图的数据:

In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node and an edge collection links and they contain the data for this graph:

// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
    // return the key (node name) for every vertex we see
    RETURN v._key

这只会返回[ "B" ],因为遍历深度隐式地为1..1(最小= 1,最大= 1).如果我们增加最大深度,那么我们也可以包括间接连接的节点:

This will only return [ "B" ], because the traversal depth is implicitly 1..1 (min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:

FOR v IN 1..10 OUTBOUND "node/A" links
    RETURN v._key

这将给我们[ "B", "C", "D", "E"].如果我们看一下图,这是正确的:我们仅遵循从我们指向的顶点到另一个顶点(箭头方向)的边.要进行相反的操作,我们可以使用INBOUND,但是在您的情况下,我们想忽略边缘的方向并始终遵循:

This will give us [ "B", "C", "D", "E"]. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND, but in your case, we want to ignore the direction of the edge and follow anyway:

FOR v IN 1..10 ANY "node/A" links
    RETURN v._key

一开始的结果可能有点令人惊讶:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]

The result might be a bit surprising at first:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]

我们看到返回了重复的节点.原因是例如有从A到C的多个路径(通过B以及通过B-F-E),并且查询返回每个路径的最后一个节点作为变量v. (实际上并不会处理最大深度为10的所有个可能路径,但是您可以设置遍历选项OPTIONS {uniqueEdges: "none"}来执行此操作.)

We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"} to do so.)

它可以帮助返回格式化的遍历路径,以更好地了解正在发生的事情(即,如何到达节点):

It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):

FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
    RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)

结果:

[
  "A - B",
  "A - B - C",
  "A - B - C - D",
  "A - B - C - E",
  "A - B - C - E - F",
  "A - B - C - E - F - B",
  "A - B - F",
  "A - B - F - E",
  "A - B - F - E - C",
  "A - B - F - E - C - D",
  "A - B - F - E - C - B"
]

图中有一个循环,但是不会有无限循环,因为在10个跃点之后超过了最大深度.但是正如您在上面看到的那样,它甚至没有达到10的深度,而是停止了,因为(默认)选项是不沿每个路径两次跟踪边缘(uniqueEdges: "path").

There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path").

无论如何,这不是理想的结果.一种便宜的技巧是使用RETURN DISTINCTCOLLECT或类似的方法来删除重复项.但是我们最好调整遍历选项,以免不必要地跟随边缘.

Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT, COLLECT or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.

uniqueEdges: "global"仍将包含B节点两次,但uniqueVertices: "global"会提供所需的结果.另外,在这种情况下,可以使用bfs: true进行广度优先搜索.区别在于到F节点的路径更短(用A-B-F代替A-B-C-E-F).通常,您应该使用的确切选项在很大程度上取决于数据集和您所遇到的问题.

uniqueEdges: "global" would still include the B node twice, but uniqueVertices: "global" gives the desired result. In addition, bfs: true for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.

还有一个要解决的问题:遍历不包括起始顶点(除了p.vertices[0]中的每条路径).通过将最小深度设置为0,可以使用ArangoDB 3.0或更高版本轻松解决此问题:

There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0] for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:

FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
    RETURN v._key

[ "A", "B", "C", "D", "E", "F" ]

要验证是否返回了从A到F的所有节点,无论起始顶点如何,我们可以发出以下测试查询:

To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:

FOR doc IN node
    RETURN (
        FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
            SORT v._key
            RETURN v._key
    )

所有子数组应该看起来相同.如果要按遍历顺序返回节点名称,请除去SORT操作.希望对您有帮助=)

All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)

这篇关于ArangoDB:获取每个节点,该节点与所选节点有任何关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆