在长时间比赛中提高Neo4j Cypher性能 [英] Improve Neo4j Cypher Performance On Lengthy Match
问题描述
设置:
- Neo4j-1.9.3
- 〜7,000个节点
- 〜180万个关系
我想使用以下密码查询来提高其性能:
I have the following cypher query that I would like to improve the performance on:
START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
这将返回471(188171毫秒).
This returns 471 (188171 ms).
现在我只得到一个计数,但是稍后我可能想要得到这些值(在此示例中为471).问题是运行大约需要3-4分钟.
Right now I'm only getting a count but later I may want to get the values (471 in this example). The problem is it takes about 3-4 minutes to run.
该图与许多关系密切相关.运行以下命令显示节点a(2)存在多少个知识"类型的边.
The graph is highly connected with many relationships. Running the following shows how many edges of type "knowledge" exist for node a(2).
START a=node(2) MATCH (a)-[:knowledge]-(x) RETURN COUNT(a);
这将返回4350(103毫秒).
This returns 4350 (103 ms).
在我看来,这似乎没有很多要检查的方面.我可以以某种方式将其拆分以提高性能吗?
To me, this doesn't seem like many edges to check. Can I split this up somehow to improve performance?
根据评论,这是使用配置文件运行查询的结果:
edit: As per the comments, here are the results from running the query with profile:
profile START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
==> +---------------------+
==> | COUNT(DISTINCT end) |
==> +---------------------+
==> | 471 |
==> +---------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a"], returnItemNames=["COUNT(DISTINCT end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a,Distinct)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(a)-[ UNNAMED7:knowledge WHERE true AND true]-(x)-[ UNNAMED8:depends WHERE true AND true]-(y)-[ UNNAMED9:knowledge WHERE true AND true]-(end)", _rows=25638262, _db_hits=25679365)
==> ParameterPipe(_rows=1, _db_hits=0)
推荐答案
我最终做了以下改善性能的事情:
I ended up doing the following to improve performance:
profile START a=node(2) MATCH (a)-[:knowledge]-(x) WITH DISTINCT x MATCH (x)-[:depends]-(y) WITH DISTINCT y MATCH (y)-[:knowledge]-(end) WITH DISTINCT end RETURN COUNT(end);
==> +------------+
==> | COUNT(end) |
==> +------------+
==> | 471 |
==> +------------+
==> 1 row
==>
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048"], returnItemNames=["COUNT(end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048,Count)"], _rows=1, _db_hits=0)
==> Distinct(_rows=471, _db_hits=0)
==> PatternMatch(g="(end)-[' UNNAMED3']-(y)", _rows=403437, _db_hits=0)
==> Distinct(_rows=735, _db_hits=0)
==> PatternMatch(g="(x)-[' UNNAMED2']-(y)", _rows=1653, _db_hits=0)
==> Distinct(_rows=177, _db_hits=0)
==> TraversalMatcher(trail="(a)-[ UNNAMED1:knowledge WHERE true AND true]-(x)", _rows=4350, _db_hits=4351)
==> ParameterPipe(_rows=1, _db_hits=0)
通过使每个步骤在总体中占一小部分,它降低了总体复杂性,并且仅遵循将要匹配的边缘.
By making each step a small part in the overall, it reduces the overall complexity and only follows edges that will match.
这篇关于在长时间比赛中提高Neo4j Cypher性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!