Neo4j Cypher路径在无向图中缓慢 [英] Neo4j Cypher path finding slow in undirected graph

查看：145 发布时间：2020/5/16 23:43:54 performance neo4j cypher

本文介绍了Neo4j Cypher路径在无向图中缓慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在具有165k节点和266k关系的图形中，我想运行以下Cypher查询:

In a graph with 165k nodes and 266k relationships I'd like to run the following Cypher query:

START n=node:NodeIds('id:firstId'), t=node:NodeIds('id:secondId')   
MATCH (n)-[:RELATIONSHIP_TYPE*1..3]-(t)   
RETURN count(*)

其中firstId和secondId是NodeIds Lucene索引的有效条目.

where firstId and secondId is a valid entry for the NodeIds Lucene index.

从Neo4j控制台执行查询大约需要4秒钟，我想了解为什么它这么慢以及如何使其更快.

The query takes about 4 seconds to execute from the Neo4j console and I'd like to understand why is it so slow and how it could be made faster.

从中进行索引查找大约需要40毫秒(即，仅返回两个节点的查询就需要花费大量时间)，所以这不会成为问题.

The index lookup from this takes about 40ms (i.e. a query just returning the two nodes takes that much) so that can't be the issue.

我从Neo4j.bat开始，在具有默认设置的Windows 8计算机上运行Neo4j.我认为，硬件不会成为问题，因为查询只会导致10％的CPU峰值短时和磁盘使用率几乎看不到峰值.

I run Neo4j on a Windows 8 machine with the default settings by starting from Neo4j.bat. I don't think hardware can be an issue as the query only causes a short 10% CPU spike and a barely visible spike in disk usage.

顺便说一句，第一个节点的阶数为40，第二个节点的阶数为2，结果为1.

BTW the first node has a degree of 40, the second 2 and the result is 1.

任何帮助将不胜感激.

编辑1，内存配置:

我从Neo4j.bat开始使用OOTB配置运行Neo4j，并具有以下有关内存的默认设置(如果我没记错的话，那是唯一与内存相关的配置):

I was running Neo4j with OOTB config by starting from Neo4j.bat with the following defaults regarding memory (if I'm not mistaken and those are the only memory-related configs):

wrapper.java.initmemory=16
wrapper.java.maxmemory=64

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

暗中射击，我将这些值提高到了以下水平:

Shooting one into the dark I raised these values to the following:

wrapper.java.initmemory=128
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

这确实增加了Neo4j的内存使用量(我的意思是运行Neo4j的java.exe实例的内存使用量)，而性能却没有得到很好的提高(查询时间大致相同，有时可能会增加2-300ms).有GB的可用RAM，因此没有硬件限制.

This indeed increased Neo4j memory usage (I mean the memory usage of the java.exe instance running Neo4j) without a good increase in performance (the query takes roughly the same time, with probably a 2-300ms increase occasionally). There are GBs of RAM free so there's no hardware constraint.

编辑2，探查器数据: 运行有问题的查询的探查器将产生以下结果:

Edit 2, profiler data: Running the profiler for the query in question yields the following results:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599') MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t) RETURN count(*);
==> +----------+
==> | count(*) |
==> +----------+
==> | 1        |
==> +----------+
==> 1 row
==> 0 ms
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE-939275295"], returnItemNames=["count(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE-939275295,CountStar)"], _rows=1, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=1, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=1, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0)

它表示0毫秒，但我不知道这是什么意思:几秒钟后返回结果，并且在数据浏览器的控制台中执行的同一查询大约需要3,5秒(这就是它所显示的)，大约通过RESTful端点获取的时间相同.

It says 0ms but I don't know what that is supposed to mean: the result is returned after multiple seconds and the same query executed in the Data Browser's console takes about 3,5s (this is what it displays) and roughly the same amount of time fetched through the RESTful endpoint.

编辑3真实数据集: 有了足够的理论:-)，这就是我真正在谈论的数据集: http://associativy.com/Media/Default/Associativy/Wiki.zip 这是一个图，该图是使用从Wikipedia转储文件创建的Wikipedia文章之间的链接生成的.这仅仅是开始.

Edit 3, the real data set: Enough with the theory :-), this is the data set what I'm really talking about: http://associativy.com/Media/Default/Associativy/Wiki.zip It's a graph generated by using the interlinks between Wikipedia articles, created from Wikipedia dump files. It's just the beginning.

我要运行的实际查询实际上是以下查询，它返回在两个节点之间建立路径的节点:

The real query I'm trying to run is actually the following one, returning the nodes building up the paths between two nodes:

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599')   MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   RETURN nodes(path) AS Nodes

我显示了计数查询，因为我想要显示症状的最简单查询.

I showed the count query because I wanted the simplest query that shows the symptoms.

我打开了另一个问题专门用于返回路径的查询.

I opened another question specifically for the path-returning query.

Neo4j Cypher路径在无向图中缓慢 [英] Neo4j Cypher path finding slow in undirected graph

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Neo4j Cypher路径在无向图中缓慢 [英] Neo4j Cypher path finding slow in undirected graph

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭