Neo4j Cypher路径在无向图中缓慢 [英] Neo4j Cypher path finding slow in undirected graph

查看:145
本文介绍了Neo4j Cypher路径在无向图中缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在具有165k节点和266k关系的图形中,我想运行以下Cypher查询:

In a graph with 165k nodes and 266k relationships I'd like to run the following Cypher query:

START n=node:NodeIds('id:firstId'), t=node:NodeIds('id:secondId')   
MATCH (n)-[:RELATIONSHIP_TYPE*1..3]-(t)   
RETURN count(*)

其中firstIdsecondId是NodeIds Lucene索引的有效条目.

where firstId and secondId is a valid entry for the NodeIds Lucene index.

从Neo4j控制台执行查询大约需要4秒钟,我想了解为什么它这么慢以及如何使其更快.

The query takes about 4 seconds to execute from the Neo4j console and I'd like to understand why is it so slow and how it could be made faster.

从中进行索引查找大约需要40毫秒(即,仅返回两个节点的查询就需要花费大量时间),所以这不会成为问题.

The index lookup from this takes about 40ms (i.e. a query just returning the two nodes takes that much) so that can't be the issue.

我从Neo4j.bat开始,在具有默认设置的Windows 8计算机上运行Neo4j.我认为,硬件不会成为问题,因为查询只会导致10%的CPU峰值短时和磁盘使用率几乎看不到峰值.

I run Neo4j on a Windows 8 machine with the default settings by starting from Neo4j.bat. I don't think hardware can be an issue as the query only causes a short 10% CPU spike and a barely visible spike in disk usage.

顺便说一句,第一个节点的阶数为40,第二个节点的阶数为2,结果为1.

BTW the first node has a degree of 40, the second 2 and the result is 1.

任何帮助将不胜感激.

编辑1,内存配置:

我从Neo4j.bat开始使用OOTB配置运行Neo4j,并具有以下有关内存的默认设置(如果我没记错的话,那是唯一与内存相关的配置):

I was running Neo4j with OOTB config by starting from Neo4j.bat with the following defaults regarding memory (if I'm not mistaken and those are the only memory-related configs):

wrapper.java.initmemory=16
wrapper.java.maxmemory=64

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

暗中射击,我将这些值提高到了以下水平:

Shooting one into the dark I raised these values to the following:

wrapper.java.initmemory=128
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

这确实增加了Neo4j的内存使用量(我的意思是运行Neo4j的java.exe实例的内存使用量),而性能却没有得到很好的提高(查询时间大致相同,有时可能会增加2-300ms).有GB的可用RAM,因此没有硬件限制.

This indeed increased Neo4j memory usage (I mean the memory usage of the java.exe instance running Neo4j) without a good increase in performance (the query takes roughly the same time, with probably a 2-300ms increase occasionally). There are GBs of RAM free so there's no hardware constraint.

编辑2,探查器数据: 运行有问题的查询的探查器将产生以下结果:

Edit 2, profiler data: Running the profiler for the query in question yields the following results:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599') MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t) RETURN count(*);
==> +----------+
==> | count(*) |
==> +----------+
==> | 1        |
==> +----------+
==> 1 row
==> 0 ms
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE-939275295"], returnItemNames=["count(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE-939275295,CountStar)"], _rows=1, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=1, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=1, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0) 

它表示0毫秒,但我不知道这是什么意思:几秒钟后返回结果,并且在数据浏览器的控制台中执行的同一查询大约需要3,5秒(这就是它所显示的),大约通过RESTful端点获取的时间相同.

It says 0ms but I don't know what that is supposed to mean: the result is returned after multiple seconds and the same query executed in the Data Browser's console takes about 3,5s (this is what it displays) and roughly the same amount of time fetched through the RESTful endpoint.

编辑3真实数据集: 有了足够的理论:-),这就是我真正在谈论的数据集: http://associativy.com/Media/Default/Associativy/Wiki.zip 这是一个图,该图是使用从Wikipedia转储文件创建的Wikipedia文章之间的链接生成的.这仅仅是开始.

Edit 3, the real data set: Enough with the theory :-), this is the data set what I'm really talking about: http://associativy.com/Media/Default/Associativy/Wiki.zip It's a graph generated by using the interlinks between Wikipedia articles, created from Wikipedia dump files. It's just the beginning.

我要运行的实际查询实际上是以下查询,它返回在两个节点之间建立路径的节点:

The real query I'm trying to run is actually the following one, returning the nodes building up the paths between two nodes:

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599')   MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   RETURN nodes(path) AS Nodes

我显示了计数查询,因为我想要显示症状的最简单查询.

I showed the count query because I wanted the simplest query that shows the symptoms.

打开了另一个问题专门用于返回路径的查询.

I opened another question specifically for the path-returning query.

推荐答案

我同意Wes的观点,这应该马上返回.

I agree with Wes, this should return in an instant.

您升级配置很有意义,这是在2个不同的配置文件中,对吧?

You upping of the config makes sense, this is in 2 different config files, right?

当您在Windows上运行时,MMIO位于Java堆内部,因此我将其设置为:

As you are running on windows MMIO is inside the java heap, so I would up this to:

wrapper.java.initmemory = 4096 wrapper.java.maxmemory = 4096

wrapper.java.initmemory=4096 wrapper.java.maxmemory=4096

返回的路径多长时间?在您的域中指定方向是否有意义?

How long is the returned path? Would it make sense in your domain to specify a direction?

请运行以下命令(使其适应返回的路径长度)

Can you please run the following (adapt it to the returned path length)

START n=node:NodeIds('id:4000'), 
      t=node:NodeIds('id:64599') 
MATCH path = (n)-[:ASSOCIATIVY_CONNECTION]-(a)
             (a)-[:ASSOCIATIVY_CONNECTION]-(b)-[:ASSOCIATIVY_CONNECTION]-(t) 
RETURN count(*), count(distinct a), count(a), count(distinct b), count(b);

这篇关于Neo4j Cypher路径在无向图中缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆