Neo4j Cypher 在无向图中寻找路径很慢 [英] Neo4j Cypher path finding slow in undirected graph

查看：28 发布时间：2021/12/28 17:09:23 performance neo4j cypher

本文介绍了Neo4j Cypher 在无向图中寻找路径很慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在具有 165k 个节点和 266k 个关系的图中，我想运行以下 Cypher 查询:

In a graph with 165k nodes and 266k relationships I'd like to run the following Cypher query:

START n=node:NodeIds('id:firstId'), t=node:NodeIds('id:secondId')   
MATCH (n)-[:RELATIONSHIP_TYPE*1..3]-(t)   
RETURN count(*)

其中 firstId 和 secondId 是 NodeIds Lucene 索引的有效条目.

where firstId and secondId is a valid entry for the NodeIds Lucene index.

从 Neo4j 控制台执行查询大约需要 4 秒，我想了解为什么它这么慢以及如何让它更快.

The query takes about 4 seconds to execute from the Neo4j console and I'd like to understand why is it so slow and how it could be made faster.

由此进行的索引查找需要大约 40 毫秒(即仅返回两个节点的查询需要花费那么多时间)，因此这不是问题.

The index lookup from this takes about 40ms (i.e. a query just returning the two nodes takes that much) so that can't be the issue.

我从 Neo4j.bat 开始使用默认设置在 Windows 8 机器上运行 Neo4j.我认为硬件不是问题，因为查询只会导致短暂的 10% CPU 峰值和几乎不可见的磁盘使用峰值.

I run Neo4j on a Windows 8 machine with the default settings by starting from Neo4j.bat. I don't think hardware can be an issue as the query only causes a short 10% CPU spike and a barely visible spike in disk usage.

顺便说一句，第一个节点的度数为 40，第二个节点的度数为 2，结果为 1.

BTW the first node has a degree of 40, the second 2 and the result is 1.

任何帮助将不胜感激.

编辑1，内存配置:

我从 Neo4j.bat 开始运行带有 OOTB 配置的 Neo4j，并使用以下有关内存的默认值(如果我没记错的话，这些是唯一与内存相关的配置):

I was running Neo4j with OOTB config by starting from Neo4j.bat with the following defaults regarding memory (if I'm not mistaken and those are the only memory-related configs):

wrapper.java.initmemory=16
wrapper.java.maxmemory=64

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

在黑暗中拍摄一个我将这些值提高到以下:

Shooting one into the dark I raised these values to the following:

wrapper.java.initmemory=128
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

这确实增加了 Neo4j 内存使用量(我的意思是运行 Neo4j 的 java.exe 实例的内存使用量)，但性能没有很好的提高(查询花费的时间大致相同，偶尔可能会增加 2-300 毫秒).有 GB 的可用 RAM，因此没有硬件限制.

This indeed increased Neo4j memory usage (I mean the memory usage of the java.exe instance running Neo4j) without a good increase in performance (the query takes roughly the same time, with probably a 2-300ms increase occasionally). There are GBs of RAM free so there's no hardware constraint.

编辑 2，分析器数据:为相关查询运行分析器会产生以下结果:

Edit 2, profiler data: Running the profiler for the query in question yields the following results:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599') MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t) RETURN count(*);
==> +----------+
==> | count(*) |
==> +----------+
==> | 1        |
==> +----------+
==> 1 row
==> 0 ms
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE-939275295"], returnItemNames=["count(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE-939275295,CountStar)"], _rows=1, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=1, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=1, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0)

它说 0ms 但我不知道那是什么意思:结果在几秒后返回，在数据浏览器的控制台中执行相同的查询需要大约 3.5 秒(这是它显示的内容)，大致通过 RESTful 端点获取的时间相同.

It says 0ms but I don't know what that is supposed to mean: the result is returned after multiple seconds and the same query executed in the Data Browser's console takes about 3,5s (this is what it displays) and roughly the same amount of time fetched through the RESTful endpoint.

编辑3、真实数据集:理论足够了:-)，这就是我真正在谈论的数据集:http://associativy.com/Media/Default/Associativy/Wiki.zip 它是通过使用维基百科文章之间的相互链接生成的图表，由维基百科转储文件创建.这才刚刚开始.

Edit 3, the real data set: Enough with the theory :-), this is the data set what I'm really talking about: http://associativy.com/Media/Default/Associativy/Wiki.zip It's a graph generated by using the interlinks between Wikipedia articles, created from Wikipedia dump files. It's just the beginning.

我尝试运行的真正查询实际上是以下查询，返回在两个节点之间建立路径的节点:

The real query I'm trying to run is actually the following one, returning the nodes building up the paths between two nodes:

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599')   MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   RETURN nodes(path) AS Nodes

我展示了计数查询，因为我想要显示症状的最简单的查询.

I showed the count query because I wanted the simplest query that shows the symptoms.

编辑 4:

我提出了另一个问题专门用于返回路径的查询.

I opened another question specifically for the path-returning query.

Neo4j Cypher 在无向图中寻找路径很慢 [英] Neo4j Cypher path finding slow in undirected graph

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Neo4j Cypher 在无向图中寻找路径很慢 [英] Neo4j Cypher path finding slow in undirected graph

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭