Neo4j数据库大小不断增长 [英] Neo4j database size grows

查看:201
本文介绍了Neo4j数据库大小不断增长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用neo4j 3.0.1社区,我有几GB的数据.这些数据很快就过时了(例如每天2,3次),我必须先创建新数据,然后再删除旧数据(因此在任何时间点都可以使用某些数据).

问题在于neo4j不会重用已删除节点/关系中的空间. 我正在使用MATCH(n)WHERE条件DETEACH DELETE n

我可以看到节点正在被删除(它们的数量恒定为〜30M),但是大小正在增长(在12次更新之后,大小几乎比应有的大小大12倍).

我找到了以前的帖子 Neo4J数据库大小/收缩 关于 store-utils ,但我想找到一个更好的解决方案.

我还发现了旧问题(来自1.x版) neostore.*删除数百万个节点后的文件大小,但至少在我看来,它根本无法像答案中那样工作.

有一些建议删除所有数据库文件,而只是创建一个新文件,但这会要求停止该服务,这是不应该的.

我还发现了一些信息,为了重新利用空间,您需要先重新启动DB,然后也尝试了一下,但是没有用.

有没有一种方法可以有效地释放/重用已删除节点/关系中的空间?也许我错过了某些配置,或者仅在企业版中可用?

最后,我有一些时间要进行测试,并且在运行场景时会刷新几次数据,同时也会重新启动服务器几次.测试是在Windows 10环境下的neo4j 3.0.0上进行的.结果是(尚未嵌入图像):

neo4j存储大小

每列表示要进一步更新的存储大小,蓝线表示neo4j服务器重新启动,最后一列(用棕线分隔)代表运行store-utils之后的大小.

如前所述,它的大小增长非常快,并且相对于文档而言,重新启动无济于事.只有store-utils有帮助(它们清理了neostore.nodestore.db以外的文件),但是将store-utils集成到生产解决方案中将是一个困难而混乱的解决方案.

谁能给我一个提示,为什么存储量会增加?

解决方案

经过大量测试,我终于找到了问题的主要根源-事实证明,我在neo4j服务器上进行了硬关闭,他无法处理,结果他努力删除节点/关系并在其后重用空间.

让我们从头开始. 我在docker下使用了neo4j(使用docker compose). 我的场景非常简单,每隔几个小时我就会启动一个过程,在该过程中我将添加几个GB的节点,完成后,我将从以前的过程中删除节点(非常简短).有时我必须更新neo4j插件或执行一些需要我重新启动服务器的工作,而这正是问题开始的地方.我正在使用docker-compose重新启动它,它永远不会等待neo4j正常退出(默认情况下,当我知道问题时,我现在必须自定义它),而是他会立即杀死他.在debug.log中,没有停止服务器的痕迹. Neo4j不处理它,结果他做了非常奇怪的事情.当我启动服务器时,他回滚nodeId计数器,relationshipId计数器和其他节点,并且在节点/关系之后不释放空间,但至少他从不回滚节点和关系本身.当然,我的删除操作已在事务中成功提交,因此不是还原未提交的更改的情况.几次重新启动和导入后,我的数据库大小乘以导入数量.另外,节点计数器被严重高估了.

我意识到我杀害neo4j主要是我的错,但是在我看来,这种举动还是不理想的.

还有另一个相关问题.我执行了将近24小时的测试,没有重新启动,在此期间我重复了20次以上的场景.我对每次导入的增长时间感到非常惊讶(跳过了不断增长的数据库大小问题)

导入nr. |创建节点时间|删除节点时间

1 | 20分钟| 0分钟(尚未删除)

2 | 20分钟| 8分钟

3 | 20分钟| 12分钟

...

〜20 | 20分钟|超过80分钟

如您所见,节点/关系很可能不会立即删除(也许实际上是在停止/启动过程中被删除),并且我的删除脚本必须做很多额外的工作.

这是我要删除的代码:

String REMOVE_OLD_REVISION_NODES_QUERY =
    "MATCH (node) " +
                "WHERE node.revision <> {" + REVISION_PARAM + "} " +
                "WITH node LIMIT 100000 " +
                "DETACH DELETE node " +
                "RETURN count(node) as count";
LOG.info("Removing nodes with revision different than: {}", revision);
long count;
do {
    count = (long) graphDb.execute(REMOVE_OLD_REVISION_NODES_QUERY, ImmutableMap.of(REVISION_PARAM, revision)).columnAs("count").next();
} while (count > 0);

当我重新启动Docker映像时,我可能能够通过杀死neo4j(添加一些脚本来确保neo4j能够正常停止)来解决问题,但是不确定是否有办法处理增大的大小和删除的时间(除非我在每次更新后重新启动neo4j).

我正在描述这个问题,所以也许有一天它会对某人有所帮助,或者因为它是我曾经处理过的最愉快的DB,尽管我不得不处理这些问题,但它可能会帮助neo4j团队改进其产品.

Im using neo4j 3.0.1 community, and i have a few GBs of data. Those data very quickly become outdated (like 2,3 times per day) and i have to create new data first, and then delete the old stuff (so at any point in time some data are available).

The problem is that neo4j doesnt reuse space from deleted nodes/relationships. Im using MATCH (n) WHERE condition DETEACH DELETE n

I can see that nodes are beeing deleted (their number is constant ~30M) but the size is growing (after 12 updates, size is almost exactly 12x bigger than it should be).

I found previous posts Neo4J database size / shrinking about store-utils but i would like to find a better solution.

I also found old question (from version 1.x) neostore.* file size after deleting millions node but it simply doesnt work like in the answer at least in my case.

There are some advices to delete all database files and just create a new one, but it would require the service to be stopped which shouldn't happen.

I also found some information that in order to reuse space you need to restart DB first, tried it as well and it didn't work.

Is there a way to effectively free/reuse space from deleted nodes/relationships ? Maybe i miss some configuration, or its available only in enterprise version?

EDIT:

Finally i had some time to test and i run scenario when data were refreshed a few times, restarting server a few times aswell. Test were made on neo4j 3.0.0 on windows 10 environment. The results are(not yet allowed to embeed images):

neo4j storage sizes

Each column presents storage size for further updates, blue line means neo4j server restart, and last column (separated with brown line) stands for size after running store-utils.

As desribed earlier, the size is growing pretty fast and against the documentation, restart doesn't help. Only store-utils helps (they clean files except neostore.nodestore.db) but it would be hard and messy solution to integrate store-utils to production solution.

Can anyone give me a hint why the storage is growing ?

解决方案

After heavy testing I finally found main source of the problem - it turns out that I was doing a hard shutdown on neo4j server which he cannot handle and in result he struggled with deleting nodes/relationships and reusing space after them.

Lets start from the beginning. I was using neo4j under docker (with docker compose). My scenario was very simple, every few hours i'm starting a process where i'm adding a few GB's of nodes, and after it's done i'm removing nodes from previous process (very briefly). Sometimes i have to update neo4j plugin or do some jobs that requires me to restart server and that's where the problem starts. I'm restarting it with docker-compose an it never waits for neo4j to gracefully quit(by default, i have to customize it now when i know about the problem), instead he kills him immediatelly. In debug.log there is no trace of stopping the server. Neo4j doesn't handle it and in result he does very strange thing. When I start server he rollbacks the nodeId counter, relationshipId counter and others and doesn't free the space after nodes/relationships but at least he never rollbacks nodes and relationships itself. Of course my delete operations were successfully committed in a transaction, so it's not a case of reverting uncommitted change. After a few restarts and imports i have a database size multiplied by number of imports. Also node counters are heavily overstated.

I realize that it's mostly my fault that i was killing neo4j, but still the behaviour is not ideal in my opinion.

There is also another related issue. I performed almost 24h test without restarts during which i was repeating my scenario over 20 times. I was very suprised about growing time of each import (skipping growing database size issue)

import nr. | creating nodes time | deleting nodes time

1 | 20 minutes | 0 min (nothing to delete yet)

2 | 20 minutes | 8 minutes

3 | 20 minutes | 12 minutes

...

~20 | 20 minutes | over 80 minutes

As you can see, nodes/relationships very probably not deleted immediatelly (maybe they are actually deleted during stop/start) and my delete script have to do a lot of extra work.

That's my code for removing:

String REMOVE_OLD_REVISION_NODES_QUERY =
    "MATCH (node) " +
                "WHERE node.revision <> {" + REVISION_PARAM + "} " +
                "WITH node LIMIT 100000 " +
                "DETACH DELETE node " +
                "RETURN count(node) as count";
LOG.info("Removing nodes with revision different than: {}", revision);
long count;
do {
    count = (long) graphDb.execute(REMOVE_OLD_REVISION_NODES_QUERY, ImmutableMap.of(REVISION_PARAM, revision)).columnAs("count").next();
} while (count > 0);

I'm probably able to solve issue with killing neo4j (adding some script that will ensure that neo4j is able to gracefully stop) when i'm restarting docker image, but not sure if there is a way to handle growing size and time of deleting (unless i restart neo4j after every update).

I'm describing the issue so maybe it will help somebody someday, or help neo4j team to improve their product beacuse it's most enjoyable DB i've ever worked with, despite the issues i have to deal with.

这篇关于Neo4j数据库大小不断增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆