使用 cypher 将节点插入到 neo4j 数据库的最有效方法是什么 [英] What is the most efficient way to insert nodes into a neo4j database using cypher

查看:22
本文介绍了使用 cypher 将节点插入到 neo4j 数据库的最有效方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过使用 py2neo python 模块 (py2neo.cypher.execute) 执行密码命令,将大量节点(~500,000)插入(非嵌入式)neo4j 数据库中.最终我需要消除对 py2neo 的依赖,但我目前正在使用它,直到我了解更多关于 cypher 和 neo4j 的信息.

I'm trying to insert a large number of nodes (~500,000) into a (non-embedded) neo4j database by executing cypher commands using the py2neo python module (py2neo.cypher.execute). Eventually I need to remove the dependence on py2neo, but I'm using it at the moment until I learn more about cypher and neo4j.

我有两个节点类型 A 和 B,并且绝大多数节点是类型 A.有两种可能的关系 r1 和 r2,例如 A-[r1]-A 和 A-[r2]-B.每个类型 A 的节点将有 0 - 100 个 r1 关系,每个类型 B 的节点将有 1 - 5000 个 r2 关系.

I have two node types A and B, and the vast majority of nodes are of type A. There are two possible relationships r1 and r2, such that A-[r1]-A and A-[r2]-B. Each node of type A will have 0 - 100 r1 relationships, and each node of type B will have 1 - 5000 r2 relationships.

目前我正在通过构建大型 CREATE 语句来插入节点.例如我可能有一个声明

At the moment I am inserting nodes by building up large CREATE statements. For example I might have a statement

CREATE (:A {uid:1, attr:5})-[:r1]-(:A {uid:2, attr:5})-[:r1]-...

其中 ... 可能是另外 5000 个左右的节点和关系,在图中形成一个线性链.这工作正常,但速度很慢.我也在使用

where ... might be another 5000 or so nodes and relationships forming a linear chain in the graph. This works okay, but it's pretty slow. I'm also indexing these nodes using

CREATE INDEX ON :A(uid)

添加所有类型 A 节点后,我再次使用 CREATE 语句添加类型 B 节点.最后,我尝试使用像

After I've add all the type A nodes, I add the type B nodes using CREATE statements again. Finally, I am trying to add the r2 relationships using a statement like

MATCH c:B, m:A where c.uid=1 AND (m.uid=2 OR m.uid=5 OR ...)
CREATE (m)-[:r2]->(c)

where ... 可以代表几千个 OR 语句.这看起来真的很慢,每秒只添加几个关系.

where ... could represent a few thousand OR statements. This seems really slow adding only a few relationships per second.

那么,有没有更好的方法来做到这一点?我在这里完全偏离轨道了吗?我看了这个问题但是这并没有解释如何使用密码来有效地加载节点.我查看的其他所有内容似乎都使用 java,而没有显示可以使用的实际密码查询.

So, is there a better way to do this? Am I completely off track here? I looked at this question but this doesn't explain how to use cypher to efficiently load the nodes. Everything else I look at seems to use java, without showing the actual cypher queries could be used.

推荐答案

不要创建索引直到结束(在 2.0 中).它会减慢节点的创建速度.

Don't create the index until the end (in 2.0). It will slow down node creation.

您是否在 Cypher 中使用参数?

Are you using parameters in your Cypher?

我想你会失去很多密码解析时间,除非你的密码每次都与参数完全相同.如果您可以将其建模为那样,您将看到显着的性能提升.

I imagine you're losing a lot of cypher parsing time unless your cypher is exactly the same each time with parameters. If you can model it to be that, you'll see a marked performance increase.

您已经在密码请求中发送了相当多的数据块,但批处理请求 API 将允许您发送多个 REST 请求,这可能会更快(试试看!).

You're already sending fairly hefty chunks in your cypher request, but the batch request API will let you send more than one in one REST request, which might be faster (try it!).

最后,如果这是一次性导入,你可以考虑使用批量导入工具——它可以在几分钟内烧穿 500K 个节点,即使在坏硬件上......然后你可以升级数据库文件(我不认为它可以创建 2.0 文件,但如果不能,它可能很快就会出现),并通过 Cypher 创建你的标签/索引.

Finally, if this is a one time import, you might consider using the batch-import tool--it can burn through 500K nodes in a few minutes even on bad hardware... then you can upgrade the database files (I don't think it can create 2.0 files yet, but that may be coming shortly if not), and create your labels/index via Cypher.

更新:我刚刚注意到你最后的 MATCH 声明.你不应该这样做——一次建立一个关系,而不是使用 OR 作为 ID.这可能会有很大帮助 - 并确保您使用 uid 参数.即使您使用索引提示,Cypher 2.0 似乎也无法使用 OR 进行索引查找.也许这会在以后出现.

Update: I just noticed your MATCH statement at the end. You shouldn't do it this way--do one relationship at a time instead of using the OR for the ids. This will probably help a lot--and make sure you use parameters for the uids. Cypher 2.0 doesn't seem to be able to do index lookups with OR, even when you use an index hint. Maybe this will come later.

2013 年 12 月更新:2.0 具有 Cypher 事务端点,我已经看到它的吞吐量有了很大的改进.我已经能够发送 20-30k Cypher 语句/秒,使用 100-200 个语句的exec"大小,以及总共 1000-10000 个语句的事务大小.对加快 Cypher 加载速度非常有效.

Update Dec 2013: 2.0 has the Cypher transactional endpoint, which I've seen great throughput improvements on. I've been able to send 20-30k Cypher statements/second, using "exec" sizes of 100-200 statements, and transaction sizes of 1000-10000 statements total. Very effective for speeding up loading over Cypher.

这篇关于使用 cypher 将节点插入到 neo4j 数据库的最有效方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆