如何有效地找到多个关系的大小 [英] How to efficiently find multiple relationship size
问题描述
我们有一个大图(超过10亿条边),该图在节点之间具有多种关系类型。
为了检查在节点之间具有单个唯一关系(即单个关系)的节点数量每个类型的两个节点之间,否则将无法连接)我们正在运行以下查询:
We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:
MATCH (n)-[:REL_TYPE]-(m)
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
为证明类似的结果,下面的示例代码可以在<$ c上运行$ c>电影图在空图中运行
:播放电影
后,结果为4个节点(在这种情况下为我们要求的是具有3种关系类型的节点)
To demonstrate a similar result, the below sample code can run on the movie graph
after running
:play movies
in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)
MATCH (n)-[]-(m)
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
是否有更好/更有效的查询方法
Is there a better/more efficient way to query the graph?
推荐答案
以下查询性能更高,因为它只扫描每个关系一次[而 size((n)-(m))
将导致关系被多次扫描]。它还指定了一个关系方向,以过滤掉一半的关系扫描,并避免了比较本机ID的需要。
The following query is more performant, since it only scans each relationship once [whereas size((n)--(m))
will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
注意:不清楚您使用的是 COUNT( DISTINCT n)+ COUNT(DISTINCT m)
的结果,但请注意,某些节点可能在添加后被计数两次。
NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m)
result for, but be aware that it is possible for some nodes to be counted twice after the addition.
[更新]
如果要获取通过过滤器的不同节点的实际数量,这是一种方法:
If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)
这篇关于如何有效地找到多个关系的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!