如果我在图形节点中有 2way 关系并且节点是相互关联的,则 Neo4j 查询最短路径卡住(不起作用) [英] Neo4j query for shortest path stuck (Do not work) if I have 2way relationship in graph nodes and nodes are interrelated

查看:16
本文介绍了如果我在图形节点中有 2way 关系并且节点是相互关联的,则 Neo4j 查询最短路径卡住(不起作用)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我制作了关系图两个关系,就像如果 A 知道 B 那么 B 知道 A,每个节点都有唯一的 Id 和 Name 以及其他属性......所以我的图看起来像

I made relation graph two relationship, like if A knows B then B knows A, Every node has unique Id and Name along with other properties.. So my graph looks like

如果我触发一个简单的查询MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]-(p2) RETURN path它没有任何响应并消耗了机器的 100% CPU 和 RAM.

if I trigger a simple query MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]-(p2) RETURN path it did not give any response and consumes 100% CPU and RAM of the machine.

更新当我阅读帖子和这篇帖子的评论时,我简​​化了模型和关系.现在它结束于

UPDATED As I read though posts and from comments on this post I simplified the model and relationship. Now it ends up to

每个关系都有不同的权重,为了简化考虑水平连接权重1,垂直权重1和对角关系权重1.5在我的数据库中有超过 85000 个节点和 30 万个关系

Each relationship has different weights, to simplify consider horizontal connections weight 1, vertical weights 1 and diagonal relations have weights 1.5 In my database there are more than 85000 nodes and 0.3 Million relationships

最短路径查询不会以某种结果结束.它卡在处理中,CPU 达到 100%

Query with shortest path is not ends up to some result. It stuck in the processing and CPU goes to 100%

推荐答案

恐怕你在这里做不了多少.您的图表非常具体,仅与最近的节点有关系.那太糟糕了,因为 Neo4j 可以在起点附近玩 +- 很少有关系,而不是每个查询都在整个图上

im afraid you wont be able to do much here. your graph is very specific, having a relation only to closest nodes. thats too bad cause neo4j is ok to play around the starting point +- few relations away, not over whole graph with each query

这意味着,一旦距离您 2 个节点,计算复杂度将上升到:

it means, once, you are 2 nodes away, the computational complexity raises up to:

8 relationships per node
distance 2
8 + 8^2

一般来说,距离 n 的最高复杂度是

in general, the top complexity for a distance n is

O(8 + 8^n) //in case all affected nodes have 8 connections

你说,你得到了 ~80 000 个节点.这意味着(如果我错了,请纠正我),~280 的最长距离(来自 √80000).让我们假设你的节点

you say, you got like ~80 000 of nodes.this means (correct me if im wrong), the longest distance of ~280 (from √80000). lets suppose your nodes

(p1:SearchableNode {name: "Ishaan"}), 
(p2:SearchableNode {name: "Garima"}),

距离只有 140 个希望.这将产生 8^140 = 10e126 的复杂性,我不确定世界上是否有任何计算机可以处理这个.

to be only 140 hopes away. this will create a complexity of 8^140 = 10e126, im not sure if any computer in the world can handle this.

当然,并非所有节点都有 8 个连接,只有那些中间",在我们的示例图中,它将有大约 500 000 个关系.你得到了大约 300 000,这可能是 2 倍,所以让我们假设平均距离为 70(140 中 - 一个非常轻松的底部估计)的总体复杂性,对于平均具有 4 个关系的节点(从 8, 80 000 下降)*4 = 320 000) 是

sure, not all nodes have 8 connections, only those "in the middle", in our example graph it will have ~500 000 relationships. you got like ~300 000, which is maybe 2 times less so lets supose the overal complexity for an average distance of 70 (out of 140 - a very relaxed bottom estimation) for nodes having 4 relationships in average (down from 8, 80 000 *4 = 320 000) to be

O(4 + 4^70) = ~10e42

一个 1GHz CPU 应该能够通过以下方式计算:

one 1GHz CPU should be able to calculate this by:

-1000 000 per second
10e42 == 10e36 * 1 000 000 -> 10e36 seconds

假设我们有一个由 100 个 10Ghz cpu 服务组成的集群,总共 1000 GHz.那仍然是 10e33 * 1 000 000 000 ->10e33 秒

lets supose we got a cluster of 100 10Ghz cpu serves, 1000 GHz in total. thats still 10e33 * 1 000 000 000 -> 10e33seconds

我建议远离AllshortestPaths,只寻找第一个可用的路径.使用 gremlin 而不是 cypher 可以通过一些启发式实现自己的算法,因此实际上您可以将时间减少到几秒或更短.

i would suggest to just keep away from AllshortestPaths, and look only for the first path available. using gremlin instead of cypher it is possible to implement own algorithms with some heuristics so actually you can cut down the time to maybe seconds or less.

示例:仅使用一个方向 = 下降到 10e16 秒.

exmaple: using one direction only = down to 10e16 seconds.

一个例子启发式:检查节点的id,node2.id-node1.id之间的差值(减值)越大,实际距离越大(考虑节点创建顺序——id相似的节点靠近一起).在这种情况下,您可以跳过查询,也可以跳过一些关系,例如 MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2(我忘了定义精确关系计数的语法) 它将(应该)实际跳转(立即跳转到)5 个距离更接近 n2 节点的节点=复杂度低于 4^704^65.因此,如果您可以准确计算到节点 id 的距离,您甚至可以匹配 ... [:RELATED..65] ... 这会将复杂性降低到 4^5 而这对于 cpu 来说只是几毫秒的问题.

an example heuristic: check the id of the node, the higher the difference (subtraction value) between node2.id - node1.id, the higher the actual distance (considering the node creation order - nodes with similar ids to be close together). in that case you can either skip the query or just jump few relations away with something like MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2 (i forgot the syntax of defining exact relation count) which will (should) actually jump (instantly skip to) 5 distances away nodes which are closer to the n2 node = complexity down from 4^70 to 4^65. so if you can exactly calculate the distance from the nodes id, you can even match ... [:RELATED..65] ... which will cut the complexity to 4^5 and thats just matter of miliseconds for cpu.

我可能在这里完全错了.在我们学校已经有一段时间了,很高兴请数学家(图论)确认这一点.

its possible im completely wrong here. it has been already some time im our of school and would be nice to ask a mathematician (graph theory) to confirm this.

这篇关于如果我在图形节点中有 2way 关系并且节点是相互关联的,则 Neo4j 查询最短路径卡住(不起作用)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆