如何有效地找到多个关系的大小 [英] How to efficiently find multiple relationship size

查看:88
本文介绍了如何有效地找到多个关系的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个大图(超过10亿条边),该图在节点之间具有多种关系类型。

为了检查在节点之间具有单个唯一关系(即单个关系)的节点数量每个类型的两个节点之间,否则将无法连接)我们正在运行以下查询:

We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:

MATCH (n)-[:REL_TYPE]-(m) 
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

为证明类似的结果,下面的示例代码可以在<$ c上运行$ c>电影图在空图中运行

:播放电影后,结果为4个节点(在这种情况下为我们要求的是具有3种关系类型的节点)

To demonstrate a similar result, the below sample code can run on the movie graph after running
:play movies in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)

MATCH (n)-[]-(m) 
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

是否有更好/更有效的查询方法

Is there a better/more efficient way to query the graph?

推荐答案

以下查询性能更高,因为它只扫描每个关系一次[而 size((n)-(m))将导致关系被多次扫描]。它还指定了一个关系方向,以过滤掉一半的关系扫描,并避免了比较本机ID的需要。

The following query is more performant, since it only scans each relationship once [whereas size((n)--(m)) will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.

MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

注意:不清楚您使用的是 COUNT( DISTINCT n)+ COUNT(DISTINCT m)的结果,但请注意,某些节点可能在添加后被计数两次。

NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m) result for, but be aware that it is possible for some nodes to be counted twice after the addition.

[更新]

如果要获取通过过滤器的不同节点的实际数量,这是一种方法:

If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:

MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)

这篇关于如何有效地找到多个关系的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆