使用Neo4j批量插入 [英] Batch Insertion with Neo4j

查看:160
本文介绍了使用Neo4j批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从表中导入了23亿条关系,导入速度不是很快,每小时的速度达到5百万,这需要20天才能完成迁移。我听说过neo4j批量插入和批量插入实用程序。该工具通过从csv文件导入来做有趣的事情,但最新的代码是一些如何破碎和不运行。



我在neo4j中有大约100M的关系,我必须全部检查应该没有重复的关系。



如何在neo4j中加快速度



通过当前代码就像

  begin transaction 
for 50K relationship
为用户创建或获取用户节点
为用户创建或获取用户节点B
检查是否存在关系如果不创建关系hsip
结束事务


我也阅读了以下内容:


在关系的情况下,假设你有足够的存储空间,我会尽量不在导入阶段建立唯一的关系 - 现在我实际上也在导入SQL表〜 3mil记录,但我总是创建一个关系,不介意它是否重复。



您可以稍后在导入后简单地执行密码查询,这会使独特这样的关系:

  START n =节点(*)MATCH n  -  [:KNOW] -m 
CREATE UNIQUE N - [:KNOW2] -m;

  START r = rel(*)其中type(r)='KNOW'delete r; 

至少这是我现在的方法,运行后面的密码查询只需要几分钟。问题可能在于你真的有两百个节点时,密码查询可能会陷入内存错误(取决于你为neo4j引擎设置多少缓存)。

I am importing 2.3 Billion relationship from a table, The import is not very fast getting a speed on 5Million per hour that will take 20 days to complete the migration. I have heard about the neo4j batch insert and and batch insert utility. The utility do interesting stuff by importing from a csv file but the latest code is some how broken and not running.

I have about 100M relations in neo4j and I have to all check that there shall be no duplicate relationship.

How can I fast the things in neo4j

By current code is like

begin transaction
for 50K relationships
create or get user node for user A
create or get user node for user B
check there is relationship KNOW between A to B if not create the relationhsip
end transaction

I have also read the following:

解决方案

in case of relationships, and supposing you have enough storage, i would try to not make unique relationships in the import phase - right now i'm actually also importing an SQL table with ~3mil records but i always create a relationship and don't mind whether it is duplicite or not.

you can later after the import simply do a cypher query which will craete unique relationships like this:

START n=node(*) MATCH n-[:KNOW]-m
CREATE UNIQUE n-[:KNOW2]-m;

and

START r=rel(*) where type(r)='KNOW' delete r;

at least this is my approach now and running the later cypher query takes just about minutes. problem could be when you really have bilions of nodes, than the cypher query might fall into an memory error (depends on how much cache you set up for the neo4j engine)

这篇关于使用Neo4j批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆