Neo4j 在批量导入时崩溃 [英] Neo4j crashes on batch import

查看:97
本文介绍了Neo4j 在批量导入时崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我导入的节点都是一个合并和关系创建语句的一部分,但 Neo4j 因 StackOverflowExceptions 或错误(-v 用于扩展信息)而崩溃:解组返回标头时出错;嵌套异常是:java.net.SocketException: 软件导致连接中止:recv 失败"

I'm importing nodes that are all part of one Merge and relationship creation statement, but Neo4j is crashing with StackOverflowExceptions or "ERROR (-v for expanded information): Error unmarshaling return header; nested exception is: java.net.SocketException: Software caused connection abort: recv failed"

我承认我的方法可能有问题,但我有一些 (A) 节点与类型 (B) 的节点有大约 8000 个关系,而 (B) 节点与其他 (A) 节点有大约 7000 个关系.

I admit my approach may be faulty, but I have some (A) nodes with ~8000 relationships to nodes of type (B) and (B) nodes have ~ 7000 relationships to other (A) nodes.

我基本上有一个很大的 MERGE 语句来创建 (A) &(B) 具有 CREATE UNIQUE 的节点,它在最后创建所有关系.我将所有这些 Cypher 存储在一个文件中并通过 Neo4jShell 导入它.

I basically have a big MERGE statement that creates the (A) & (B) nodes with a CREATE UNIQUE that does all the relationship creating at the end. I store all this Cypher in a file and import it through the Neo4jShell.

示例:

MERGE (foo:A { id:'blah'})
MERGE (bar:B {id:'blah2'})
MERGE (bar2:B1 {id:'blah3'})
MERGE (bar3:B3 {id:'blah3'})
MERGE (foo2:A1 {id:'blah4'})
... // thousands more of these
CREATE UNIQUE foo-[:x]->bar,  bar-[:y]->foo2, // hundreds more of these

有没有更好的方法来做到这一点?我试图避免创建所有 Merge 语句,然后匹配每个语句以在另一个查询中创建关系.我在两种方式上的导入性能都非常慢.将每个合并拆分为一个事务很慢(2 小时导入 60K,节点/关系).当前方法会导致 neo4j 崩溃

Is there a better way to do this ? I was trying to avoid creating all the Merge statements, then matching each one to create the relationships in another query. I get really slow import performance on both ways. Splitting up each merge as a transaction is slow (2 hrs import for 60K, nodes/relationships). Current approach crashes neo4j

当前的单一大合并/创建独特方法适用于第一个大插入,但在下一次大插入使用 5000 个节点和 8000 个关系时失败.这是第一次大合并的结果:

The current one big merge/create unique approach works for the first big insert, but fails after that when the next big insert uses 5000 nodes and 8000 relationships. Here is the result for the first big merge:

Nodes created: 756
Relationships created: 933
Properties set: 5633
Labels added: 756
15101 ms

我使用的是带有 8GB RAM 的 Windows 7 机器.在我的 neo4j.wrapper 中,我使用:

I'm using a Windows 7 machine with 8GB RAM. In my neo4j.wrapper I use:

wrapper.java.initmemory=512
wrapper.java.maxmemory=2048

推荐答案

有 3 件事可能会有所帮助:

There are 3 things that might help:

  1. 如果你真的不需要合并,你应该只使用创建来代替.Create 更高效,因为它不必检查现有关系

  1. If you don't really need merge, you should use just a create instead. Create is more efficient because it doesn't have to check for existing relations

确保索引正确

您现在拥有 1 笔大交易中的所有内容.您陈述了在 1 个事务中包含每个语句的替代方案.两者都不适合你.但是,您可以进行交易,例如,每个交易 100 条语句.这种方法应该比每个事务1条语句更快,并且仍然比将所有内容放入1个大事务中使用更少的内存

You now have everything in 1 big transaction. You state the alternative of having every statement in 1 transaction. Neither works for you. However, you could make transactions of, say, 100 statements each. This approach should be quicker than 1 statement per transaction, and still use less memory than putting everything in 1 big transaction

这篇关于Neo4j 在批量导入时崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆