在不知道节点ID的情况下在Neo4j中导入大数据 [英] Import large Data in Neo4j without knowing the node id

查看:50
本文介绍了在不知道节点ID的情况下在Neo4j中导入大数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,女巫是在neo4j db中插入一些数据的最佳方法. 我有一个包含很多电影信息的文件,每个电影都有一个不同的ID,例如"tt0202025". 我也有一个包含演员信息的文件,每个人都有一个ID,例如"mm2183122" 我有另一个文件,女巫描述了演员所在的电影. mm2183122 | tt0202025,tt0204548

我发现了一些用于csv的东西,例如插入,但是由于数据大小,我之前无法在节点之间创建关系文件.

我是否必须先添加电影节点和演员节点,然后再添加它们之间的关系?以及如何知道为每个节点创建哪个节点ID来创建关系?数据很大.

我也阅读了有关批处理插入的信息,但我无法确切了解其工作原理,因此无法用Java编写代码.

希望有人可以指导我!

提前谢谢!

正如您所指出的,

解决方案

最简单的解决方案是插入电影节点和actor节点,然后创建关系.

您可以使用"mm2183122"和"tt0202025"作为节点上的属性来跟踪节点:例如,您在节点上具有"file_id"属性,在创建节点时对其进行索引,并在需要时对其进行查询建立关系.我将使用唯一索引来确保如果您多次插入节点,则不会复制节点.对于REST API,您可以在此处获取文档: http://docs .neo4j.org/chunked/milestone/rest-api-unique-indexes.html

Cypher还允许您创建唯一的节点.

I have a problem in witch is the best way to insert some data in neo4j db. I have a file with a lot of movies info, each movie has a different id like "tt0202025". I also have a file with the actors info and each one has an id like "mm2183122" I have another file witch describes in which movie an actor is part of. mm2183122|tt0202025,tt0204548

I have found some things for csv like insertion but i am not able to create the relations file between the nodes before because of the data size.

Do i have to add first the movies nodes and the actors nodes and after the relationships between them? And how is possible to know which node id is created for every node to create the relationships? The data is a big.

i also read for Batch Insertion but i was not able to understand exactly how it work so i can write my code in java.

Wish that someone can guide me!

Thanks in advance!

解决方案

The simplest solution, as you pointed out, is to insert movies nodes and actor nodes, then create the relationships.

You can track the nodes using "mm2183122" and "tt0202025" as properties on the nodes: for instance you'd have an "file_id" property on your nodes that you index when you create the nodes and query it when you want to create a relationship. I'd use unique indexes for making sure you don't duplicate the nodes if you insert them several times. For REST API, you have the doc here: http://docs.neo4j.org/chunked/milestone/rest-api-unique-indexes.html

Cypher also allows you to create unique nodes.

这篇关于在不知道节点ID的情况下在Neo4j中导入大数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆