neo4j导入速度放慢 [英] neo4j import slowing down

查看:139
本文介绍了neo4j导入速度放慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用cypher将大约500,000个节点的中等数据集导入neo4j.我在装有SSD的3.4 GHz i7 iMac上本地运行neo4j-community-2.0.0-M05.

I'm trying to import a medium data set of about 500,000 nodes into neo4j using cypher. I am running neo4j-community-2.0.0-M05 locally on my 3.4 GHz i7 iMac with SSD.

我正在将密码传输到neo4j外壳上,每40k行封装成一个事务.

I am piping the cypher to neo4j shell, wrapping every 40k lines into a transaction.

我正在使用标签,在开始使用标签之前,我在每个标签节点的一个属性上创建了索引.

I am using labels and before I started, I created indices on one property per labeled node.

昨晚我离开时,MATCH CREATE UNIQUE各自花费大约15毫秒.今天早上,他们大约需要6000毫秒.

When I left last night, MATCH CREATE UNIQUE were taking about 15ms each. This morning they are taking about 6000ms.

慢速查询看起来像这样

MATCH n:Artifact WHERE n.pathId = 'ZZZ' CREATE UNIQUE n-[r:DEPENDS_ON]->(a:Artifact {pathId: 'YYY'}) RETURN a
1 row
5719 ms

pathId已建立索引.

pathId is indexed.

我了解这是一个里程碑式的构建,并且可能未对性能进行优化.但是我还不到导入的三分之一,而且越来越慢.

I understand this is a milestone build and probably not performance optimized. But I'm less than a third of the way through my import and it's slowing down more and more.

我应该考虑使用除密码以外的其他方法来导入此数据吗?

Should I look at some other methods than cypher to import this data?

推荐答案

我只是想回答我自己的问题,以防别人发现.感谢Peter建议批量导入项目.我使用了 2.0树.

I just want to answer my own question in case someone else finds this. Thanks to Peter for suggesting the batch import project. I used the 2.0 tree.

我的工作流程最终是(1)将所有数据加载到关系数据库中,(2)清理重复项,然后(3)编写脚本以将数据导出到CSV文件中.

My workflow ended up being to (1) load all the data into a relational database, (2) clean up duplicates, and then (3) write a script to export the data into CSV files.

使用cypher,我将导入运行了24小时,然后杀死了它.使用java导入工具,neo4j-community-2.0.0-M06整个导入过程耗时11秒.

Using cypher, I had the import running for 24 hours before I killed it. Using the java import tool, the entire import took 11 seconds with neo4j-community-2.0.0-M06.

底线:不要费心尝试写出密码来导入大块数据.如果有必要,花一个小时清理数据,然后导出为CSV并使用Java批量导入工具.

Bottom line: don't bother trying to write out cypher to import large chunks of data. Spend an hour cleaning up your data if necessary, then export to CSV and use the java batch import tool.

这篇关于neo4j导入速度放慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆