neo4j缓存如何加快查询速度? [英] how are neo4j caches speeding up queries?

查看:481
本文介绍了neo4j缓存如何加快查询速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用Neo4j作为数据库和涉及一些困难关系发现的查询的项目,在运行性能测试后,我们遇到了一些问题。

I am currently working on a project using neo4j as database and queries that involve some hard relationship discover, and after running performance testing we are having some issues.

我们发现缓存正在疯狂地影响请求的时间(从3000ms到100ms左右)。两次执行相同的请求将导致一个速度非常慢,而第二次则要快得多。经过一些搜索,我们看到了预热方法,该方法将预加载数据库中的所有节点和关系,查询如下所示:

We have found out that cache is influencing the time of the requests insanely (from 3000ms to 100ms or so). Doing the same request twice would result in one really slow, and the second one much faster. After some searches we saw the warm-up method, that is going to preload all the nodes and relationships in the database querying something like this:

match (n)-[r]->() return count(1);

激活了缓存并添加了此预热查询,我们的查询时间大大减少了,

Having cache activated plus this warm-up query we had a big decrease of the time of our queries, but still not as fast as if you queried two, three or four times the same query.

所以我们继续测试和搜索信息,直到发现Neo4j还是以某种方式缓冲查询以便每次都不编译(如果我是对的,请使用Scala编译器)。我以某种方式说,因为经过严格的测试,我可以断定Neo4j正在即时编译查询。

So we went on testing and searching info until that we saw that Neo4j is also somehow buffering the queries in order to not be compiled every time (using Scala compiler, if I am right). I say somehow, because after intense testing I could conclude that Neo4j is compiling the query "on the fly".

让我展示我的意思的简化示例:

Let me show a simplified example of what I mean:

(数字是 id 属性)

如果我发出如下请求:

match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:yellow {id: 7}) 
return count(m);

我要做的是查找节点1和节点之间是否存在连接。如您所见,我必须发现一堆节点以及更重要的关系,并且编译过程看起来多少有些复杂,因为该请求花费了1227毫秒才能完成。如果再次发出完全相同的请求,我将获得大约5毫秒的响应时间,足以通过性能测试。肯定是Neo4j或Scala编译器也在缓冲密码查询。

What I want to do is to find if there is a connection between the node 1 and the node. As you can see, I have to discover a bunch of nodes and more important, relationships, and the compile process looks more or less complicated since the request took 1227 ms to complete. If I make exactly the same request again, I get a response time of about 5 ms, good enough to pass the performance testing. Definitely Neo4j or the Scala compiler was buffering the cypher queries too.

在了解到密码请求中存在编译过程之后,我开始做得更深入,开始只修改其中的一部分一个已经缓冲的请求。更改最后匹配的节点的label或id参数也会产生延迟,但只有〜19 ms,仍然可以接受:

After understanding that there is a compile process in the cypher request, I went deeper and started modifying only parts of an already buffered request. Changing the label or id parameter of the last node matched was also producing a delay, but only ~19 ms, still acceptable:

match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:purple {id: 7}) 
return count(m);

但是,当我重新启动服务器时,请进行热身并调整查询,以便第一个节点(之前标记为n)不匹配,查询将以0个结果非常快速地响应,因此我可以推断出并不是所有查询都已解析,因为第一个节点不匹配,因此无需深入了解

However, when I restart the server, do warm-up and adjust the query so that the first node (labelled before as n) doesn't match, the query will respond very fast with 0 results so I can deduce that not all the query was parsed, since the first node didn't match and there is no need to go deeper in the tree.

我还尝试了可选匹配,如果没有找到匹配项,则返回null,但是它也不起作用。

I also tried with optional match, providing that returns null if no match was found, but it isn't working either.

我想首先问一下到目前为止,我在测试中说的所有内容是否正确,如果不正确,它实际上是如何工作的?其次,当服务器启动时,我应该怎么做(如果有办法)在开始时缓存所有内容。不幸的是,该项目的要求说查询应该表现良好,甚至第一个查询也要表现良好(并不是说实际场景中有成千上万个关系和节点,这使得一切变慢了),或者没有办法避免这种延迟。

I wanted to ask first of all if so far everything that I said based in my tests is correct and in case that it is not, how it's actually working ? And secondly, what should I do (if there is a way) to cache everything at the beginning, when the server started. Unfortunately, the requirements of the project say that queries should perform well, even the first one (and not to say that the real scenario has thousands more relationships and nodes, making everything slower), or if there is no way to avoid this delay.

推荐答案

首先,您需要考虑JVM的预热-注意在需要时会延迟加载类(您的第一个查询),并且JIT可能仅在几个(数千个)呼叫后才启动。

First of all you need to consider JVM warm up - beware that classes are loaded lazily when needed (your first query) and JIT may only kick in after several (thousands) of calls.

match (n)-[r]->() return count(1);

应该正确预热节点和关系缓存,但是我不确定它是否也加载了所有属性和索引。还要确保您的数据集适合内存。

should properly warm up node and relationship cache, however I am not sure if it also loads all their properties and indexes. Also make sure that your data set fits in memory.

直接在密码查询中提供值,例如: {id:1} 而不是使用参数 {id:{paramId}} 意味着,当您更改id的值时,需要再次编译查询。

Providing values directly in cypher query like this: {id: 1}, instead of using parameters{id: {paramId}} means that when you change the value of the id then the query needs to be compiled again.

您可以在shell中以这种方式传递参数:

You can pass parameters in this way in shell:

neo4j-sh (?)$ export paramId=5
neo4j-sh (?)$ return {paramId};
==> +-----------+
==> | {paramId} |
==> +-----------+
==> | 5         |
==> +-----------+
==> 1 row
==> 4 ms

因此,如果您需要从头开始执行查询

So if you need to have performing queries from the beginning


  • 更改查询以使用参数

  • 在启动时执行其他查询以及热身查询

编辑:添加了有关如何在Shell中传递参数的信息

EDIT: added information how to pass parameters in shell

这篇关于neo4j缓存如何加快查询速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆