Neo4j Cypher 在无向图中寻找路径很慢 [英] Neo4j Cypher path finding slow in undirected graph

查看:28
本文介绍了Neo4j Cypher 在无向图中寻找路径很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在具有 165k 个节点和 266k 个关系的图中,我想运行以下 Cypher 查询:

In a graph with 165k nodes and 266k relationships I'd like to run the following Cypher query:

START n=node:NodeIds('id:firstId'), t=node:NodeIds('id:secondId')   
MATCH (n)-[:RELATIONSHIP_TYPE*1..3]-(t)   
RETURN count(*)

其中 firstIdsecondId 是 NodeIds Lucene 索引的有效条目.

where firstId and secondId is a valid entry for the NodeIds Lucene index.

从 Neo4j 控制台执行查询大约需要 4 秒,我想了解为什么它这么慢以及如何让它更快.

The query takes about 4 seconds to execute from the Neo4j console and I'd like to understand why is it so slow and how it could be made faster.

由此进行的索引查找需要大约 40 毫秒(即仅返回两个节点的查询需要花费那么多时间),因此这不是问题.

The index lookup from this takes about 40ms (i.e. a query just returning the two nodes takes that much) so that can't be the issue.

我从 Neo4j.bat 开始使用默认设置在 Windows 8 机器上运行 Neo4j.我认为硬件不是问题,因为查询只会导致短暂的 10% CPU 峰值和几乎不可见的磁盘使用峰值.

I run Neo4j on a Windows 8 machine with the default settings by starting from Neo4j.bat. I don't think hardware can be an issue as the query only causes a short 10% CPU spike and a barely visible spike in disk usage.

顺便说一句,第一个节点的度数为 40,第二个节点的度数为 2,结果为 1.

BTW the first node has a degree of 40, the second 2 and the result is 1.

任何帮助将不胜感激.

编辑1,内存配置:

我从 Neo4j.bat 开始运行带有 OOTB 配置的 Neo4j,并使用以下有关内存的默认值(如果我没记错的话,这些是唯一与内存相关的配置):

I was running Neo4j with OOTB config by starting from Neo4j.bat with the following defaults regarding memory (if I'm not mistaken and those are the only memory-related configs):

wrapper.java.initmemory=16
wrapper.java.maxmemory=64

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

在黑暗中拍摄一个我将这些值提高到以下:

Shooting one into the dark I raised these values to the following:

wrapper.java.initmemory=128
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

这确实增加了 Neo4j 内存使用量(我的意思是运行 Neo4j 的 java.exe 实例的内存使用量),但性能没有很好的提高(查询花费的时间大致相同,偶尔可能会增加 2-300 毫秒).有 GB 的可用 RAM,因此没有硬件限制.

This indeed increased Neo4j memory usage (I mean the memory usage of the java.exe instance running Neo4j) without a good increase in performance (the query takes roughly the same time, with probably a 2-300ms increase occasionally). There are GBs of RAM free so there's no hardware constraint.

编辑 2,分析器数据:为相关查询运行分析器会产生以下结果:

Edit 2, profiler data: Running the profiler for the query in question yields the following results:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599') MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t) RETURN count(*);
==> +----------+
==> | count(*) |
==> +----------+
==> | 1        |
==> +----------+
==> 1 row
==> 0 ms
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE-939275295"], returnItemNames=["count(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE-939275295,CountStar)"], _rows=1, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=1, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=1, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0) 

它说 0ms 但我不知道那是什么意思:结果在几秒后返回,在数据浏览器的控制台中执行相同的查询需要大约 3.5 秒(这是它显示的内容),大致通过 RESTful 端点获取的时间相同.

It says 0ms but I don't know what that is supposed to mean: the result is returned after multiple seconds and the same query executed in the Data Browser's console takes about 3,5s (this is what it displays) and roughly the same amount of time fetched through the RESTful endpoint.

编辑3、真实数据集:理论足够了:-),这就是我真正在谈论的数据集:http://associativy.com/Media/Default/Associativy/Wiki.zip 它是通过使用维基百科文章之间的相互链接生成的图表,由维基百科转储文件创建.这才刚刚开始.

Edit 3, the real data set: Enough with the theory :-), this is the data set what I'm really talking about: http://associativy.com/Media/Default/Associativy/Wiki.zip It's a graph generated by using the interlinks between Wikipedia articles, created from Wikipedia dump files. It's just the beginning.

我尝试运行的真正查询实际上是以下查询,返回在两个节点之间建立路径的节点:

The real query I'm trying to run is actually the following one, returning the nodes building up the paths between two nodes:

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599')   MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   RETURN nodes(path) AS Nodes

我展示了计数查询,因为我想要显示症状的最简单的查询.

I showed the count query because I wanted the simplest query that shows the symptoms.

编辑 4:

提出了另一个问题 专门用于返回路径的查询.

I opened another question specifically for the path-returning query.

推荐答案

我同意 Wes 的观点,这应该立即返回.

I agree with Wes, this should return in an instant.

你升级配置是有道理的,这是在 2 个不同的配置文件中,对吧?

You upping of the config makes sense, this is in 2 different config files, right?

由于您在 Windows 上运行,MMIO 位于 Java 堆内,所以我想:

As you are running on windows MMIO is inside the java heap, so I would up this to:

wrapper.java.initmemory=4096wrapper.java.maxmemory=4096

wrapper.java.initmemory=4096 wrapper.java.maxmemory=4096

返回的路径有多长?在您的领域中指定方向是否有意义?

How long is the returned path? Would it make sense in your domain to specify a direction?

能否请您运行以下(使其适应返回的路径长度)

Can you please run the following (adapt it to the returned path length)

START n=node:NodeIds('id:4000'), 
      t=node:NodeIds('id:64599') 
MATCH path = (n)-[:ASSOCIATIVY_CONNECTION]-(a)
             (a)-[:ASSOCIATIVY_CONNECTION]-(b)-[:ASSOCIATIVY_CONNECTION]-(t) 
RETURN count(*), count(distinct a), count(a), count(distinct b), count(b);

这篇关于Neo4j Cypher 在无向图中寻找路径很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆