如何做初始批量导入CSV / MySQL数据到neo4j数据库 [英] How to do an initial batch import of CSV / MySQL data into neo4j database
问题描述
我正在考虑用neo4j数据库替换MySQL数据库。我是neo4j的一个完整的初学者,并想知道如何做一个批处理插入我的当前MySQL数据到neo4j数据库,所以我可以实验,并开始了解neo4j。
关系数据库包含4个表: Person
,有机体
,故事
,链接
。
链接描述其他3个表中的行之间的关系。
链接
:
ID ,FromTable,FromID,ToTable,ToID,LinkType
Person
:
ID,property_2,property_1, etc ...
有机体
:
ID,property_A,property_B等....
故事
:
ID,property_x,property_y
每个ID字段是一个自动递增的整数,从1开始,对于每个表
如果不明显,说出ID 3的人和一个故事ID 42的链接表ID = autoincrement,FromTable = Person,FromID = 3,ToTable = Story,ToID = 42中有一行。
虽然我使用的术语'从'和'到'的实际链接实际上不是真正的直接。
我看过迈克尔饥饿的批量导入,但似乎只能使用一个节点表和一个关系表,而我看导入三种不同类型的节点和它们之间的关系列表。
我有neo4j运行,
任何建议让我开始非常感谢。
我不熟悉Java,虽然我使用Python和bash shell脚本。
在初始导入后,我将使用带有Javascript的RESTful接口。
https://github.com/jexp/batch-import/issues/4 =noreferrer>在git repo 。使用Michael Hunger的批量导入,可以从一个.csv文件导入多个节点类型。
引用Michael:
只要把它们全部放到一个节点文件中,你可以有任何属性不是
因此,我使用的一般方法是:
将所有节点表合并到一个名为 nodes
的表中:
<
newID
字段和<$ c $创建新表 nodes
c> type 字段。类型字段将记录节点数据来自哪个表 , Organism
的值,然后除了将类型
字段设置为个人,生物体或故事,code> Story
将任何不相关的字段留空。在新表中添加 rels
添加新的基于sql创建 newID
索引到链接
表 JOIN
: INSERT INTO rels
SELECT
n1.newID AS fromNodeID,
n2。 newID AS toNodeID,
L.LinkType,
L.ID
FROM
链接L
LEFT JOIN
节点n1
ON
L.fromID = n1.ID
AND
L.fromType = n1.type
LEFT JOIN
节点n2
ON
L.toID = n2 .ID
AND
L.toType = n2.type;
然后导出这两个新表 nodes
rels
作为制表符分隔的.csv文件,并在批量导入中使用它们:
$ java -server -Xmx4G -jar target / batch-import-jar-with-dependencies.jar target / graph.db nodes.csv rels.csv
I am considering replacing a MySQL database with a neo4j database. I am a complete beginner with neo4j and would like to know how to go about doing a batch insert of my current MySQL data into the neo4j database so i can experiment and begin to learn about neo4j.
the relational database consists of 4 tables: Person
, Organism
, Story
, Links
.
Links describes relationships between rows in the other 3 tables.
Links
:
ID, FromTable, FromID, ToTable, ToID, LinkType
Person
:
ID, property_2, property_1, etc ...
Organism
:
ID, property_A, property_B, etc ....
Story
:
ID, property_x, property_y
each ID field is an auto incrementing integer starting from 1 for each table
In case it is not obvious, a link between say person with ID 3 and a story with ID 42 would have a row in the Links table ID=autoincrement, FromTable=Person, FromID=3, ToTable=Story, ToID=42.
Even though I am using the terms 'from' and 'to' the actual links are not really 'directed' in practice.
I have looked at Michael Hunger's batch-import but that seems to only work with a single table of nodes and one table of relationships, whereas I am looking to import three different types of nodes and one list of relationships between them.
I have got neo4j up and running,
Any advice to get me started would be greatly appreciated.
I am not familiar with Java, though I do use Python and bash shell scripts.
After initial import, I will be using the RESTful interface with Javascript.
解决方案 Based on advice in the git repo. Using Michael Hunger's batch-import it is possible to import multiple node types from the one .csv file.
To quote Michael:
Just put them all into one nodes file, you can have any attribute not
having a value in a certain row, it will then just be skipped.
So the general approach i used was:
combine all the nodes tables into a new table called nodes
:
- Create a new table
nodes
with an auto incrementing newID
field and a type
field. the type field will record what table the node data came from
- Add all the possible columns names from the 3 node tables allowing nulls.
INSERT INTO nodes
the values from Person
, then Organism
, then Story
, in addition to setting the type
field to person, organism, or story. Leave any unrelated fields blank.
in another new table rels
add the newly created newID
indexes to the Links
table based on a sql JOIN
:
INSERT INTO rels
SELECT
n1.newID AS fromNodeID,
n2.newID AS toNodeID,
L.LinkType,
L.ID
FROM
Links L
LEFT JOIN
nodes n1
ON
L.fromID = n1.ID
AND
L.fromType = n1.type
LEFT JOIN
nodes n2
ON
L.toID = n2.ID
AND
L.toType = n2.type;
Then export these two new tables nodes
and rels
as Tab seperated .csv files, and use them with batch-import:
$java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/graph.db nodes.csv rels.csv
这篇关于如何做初始批量导入CSV / MySQL数据到neo4j数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!