Neo4j 数据导入缓慢 [英] Neo4j Data Import Slowness

查看:331
本文介绍了Neo4j 数据导入缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在 Neo4j DB 中加载大约 500 万条记录,所以我将 excel 分成了 100K 块,数据采用表格格式,我为此使用了 CyperShell,但似乎已经超过 8 小时,但它仍然存在卡在第一个块

I have to load around 5M Records in the Neo4j DB so I broke the excel into the chunks of 100K the Data is in Tabular Format and I am using CyperShell for that but seems like it has been more than 8 hours and it's still stuck on the first chunk

我正在使用

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'file://aa.xlsx' as row
MERGE (p1:L1 {Name: row.sl1})
MERGE (p2:L2 {Name: row.sl2})
MERGE (p3:L3 {Name: row.sl3, Path:row.sl3a})
MERGE (p4:L4 {Name: row.sl4})
MERGE (p5:L4 {Name: row.tl1})
MERGE (p6:L3 {Name: row.tl2})
MERGE (p7:L2 {Name: row.tl3, Path:row.tl3a})
MERGE (p8:L1 {Name: row.tl4})
MERGE (p1)-[:s]->(p2)-[:s]->(p3)-[:s]->(p4)-[:it]->(p5)-[:t]->(p6)-[:t]->(p7)-[:t]->(p8)

任何人都可以建议我更改或替代方法以更快的方式加载数据

Can Anyone Suggest me the changes or alternate Method to load the data in faster way

Excel 格式的数据

Data in Excel Format

推荐答案

  1. 对于导入大量数据,您应该考虑使用import 工具而不是 Cypher 的 LOAD CSV 子句.该工具只能导入以前未使用的数据库.

  1. For importing a large amount of data, you should consider using the import tool instead of Cypher's LOAD CSV clause. That tool can only import into a previously unused database.

如果您仍想使用LOAD CSV,则需要进行一些更改.

If you still want to use LOAD CSV, you need to make some changes.

  • 您未正确使用 MERGE,并且结果可能会生成许多重复的节点和关系.您可能会发现这个答案很有启发性.

  • You are using MERGE improperly, and are probably generating many duplicate nodes and relationships as a result. You may find this answer instructive.

MERGE 子句的整个模式该模式尚不存在.

A MERGE clause's entire pattern will be created if anything in the pattern does not already exist.

因此,您的最后一个 MERGE 模式及其七个关系特别危险.它应该被分成七个具有独立关系的 MERGE 子句.

So, your last MERGE pattern, with its seven relationships, is especially dangerous. It should be split into seven MERGE clauses with individual relationships.

此外,指定多个属性的 MERGE 模式也可能很糟糕.例如,如果所有 L3 节点都有唯一的 Name 值,那么替换它会更安全:

Also, a MERGE pattern that specifies multiple properties is likely bad as well. For example, if all L3 nodes have a unique Name value, then it would be safer to replace this:

MERGE (p3:L3 {Name: row.sl3, Path:row.sl3a})

类似于以下内容:

MERGE (p3:L3 {Name: row.sl3})
ON CREATE SET p3.Path = row.sl3a

在上面的代码片段中,如果节点已经存在但 row.sl3a 与现有的 Path 值不同,则不会创建其他节点.另外,由于节点已经存在,ON CREATE 选项不执行其 SET 子句,保持原来的 Path 值不变.您也可以选择使用 ONMATCH 代替,或者如果您无论如何都想设置值,甚至可以直接调用 SET .

In the above snippet, if the node already exists but row.sl3a is different than the existing Path value, then no additional node is created. In addition, since the node already existed, the ON CREATE option does not execute its SET clause, leaving the original Path value unchanged. You could also choose to use ON MATCH instead, or even just call SET directly if you want to set the value no matter what.

为了避免每次 MERGE 需要查找现有节点时都必须扫描具有给定标签的所有节点,您应该创建一个 index唯一性约束,用于您MERGE的每个节点的每个标签/属性对ing:

To avoid having to scanning through all the nodes with a given label every time MERGE needs to find an existing node, you should create an index or uniqueness constraint for every label/property pair of every node that you are MERGEing:

  • :L1(Name)
  • :L2(Name)
  • :L3(Name)
  • :L4(Name)

这篇关于Neo4j 数据导入缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆