为什么allshortestpath这么慢? [英] why allshortestpath so slow?
问题描述
我用python和neo4j库创建了一些图形数据库.图有5万个节点和10万个关系.
I create some graph database with python and neo4j library. Graph have 50k nodes and 100k relationships.
如何创建节点:
CREATE (user:user {task_id: %s, id: %s, root: 1, private: 0})
如何建立关系:
MATCH (root_user), (friend_user) WHERE root_user.id = %s
AND root_user.task_id = %s
AND friend_user.id = %s
AND friend_user.task_id = %s
CREATE (root_user)-[r: FRIEND_OF]->(friend_user) RETURN root_user, friend_user
我如何搜索节点之间的所有路径:
How i search all path between nodes:
MATCH (start_user:user {id: %s, task_id: %s}),
(end_user:user {id: %s, task_id: %s}),
path = allShortestPaths((start_user)-[*..3]-(end_user)) RETURN path
在50k图表上缓慢滚动,大约30-60分钟.而且我不明白为什么.我尝试创建这样的索引:
Soo its very slow, around 30-60 min on 50k graph. And i cant understand why. I try to create index like this:
CREATE INDEX ON :user(id, task_id)
但没有帮助.你能帮助我吗?谢谢.
but its not help. Can you help me? Thanks.
推荐答案
永远不要生成包含N个基本相同的Cypher代码的细微变化的长Cypher查询.这非常慢,并且占用大量内存.
You should never generate a long Cypher query that contains N slight variations of essentially the same Cypher code. That is very slow and takes up a lot of memory.
相反,您应该将参数传递给很多更简单的Cypher查询.
Instead, you should be passing parameters to a much simpler Cypher query.
例如,在创建节点时,可以将data
参数传递给以下Cypher代码:
For example, when creating your nodes, you could pass a data
parameter to the following Cypher code:
UNWIND $data AS d
CREATE (user:user {task_id: d.taskId, id: d.id, root: 1, private: 0})
您传递的data
参数值将是一个映射列表,每个映射将包含一个taskId
和id
. UNWIND
子句将data
列表展开"为单独的d
映射.这样会更快.
The data
parameter value that you pass would be a list of maps, and each map would contain a taskId
and id
. The UNWIND
clause "unwinds" the data
list into individual d
maps. This would be much faster.
您需要使用关系创建代码来完成类似的工作.
Something similar needs to be done with your relationship-creation code.
此外,为了使用任何:user
索引,您的MATCH
子句必须在相关的节点模式中指定:user
标签.否则,您将要求Cypher扫描所有节点,而不管标签如何,这样的处理将无法利用索引.例如,相关查询应以以下内容开头:
In addition, in order to use any of your :user
indexes, your MATCH
clause MUST specify the :user
label in the relevant node patterns. Otherwise, you are asking Cypher to scan all nodes, regardless of label, and that kind of processing would not be able to take advantage of indexes. For example, the relevant query should start with:
MATCH (root_user:user), (friend_user:user)
...
这篇关于为什么allshortestpath这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!