如何做初始批量导入CSV / MySQL数据到neo4j数据库 [英] How to do an initial batch import of CSV / MySQL data into neo4j database

查看:4935
本文介绍了如何做初始批量导入CSV / MySQL数据到neo4j数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑用neo4j数据库替换MySQL数据库。我是neo4j的一个完整的初学者,并想知道如何做一个批处理插入我的当前MySQL数据到neo4j数据库,所以我可以实验,并开始了解neo4j。



关系数据库包含4个表: Person 有机体故事链接
链接描述其他3个表中的行之间的关系。



链接
ID ,FromTable,FromID,ToTable,ToID,LinkType



Person
ID,property_2,property_1, etc ...



有机体
ID,property_A,property_B等....



故事
ID,property_x,property_y



每个ID字段是一个自动递增的整数,从1开始,对于每个表



如果不明显,说出ID 3的人和一个故事ID 42的链接表ID = autoincrement,FromTable = Person,FromID = 3,ToTable = Story,ToID = 42中有一行。
虽然我使用的术语'从'和'到'的实际链接实际上不是真正的直接。



我看过迈克尔饥饿的批量导入,但似乎只能使用一个节点表和一个关系表,而我看导入三种不同类型的节点和它们之间的关系列表。



我有neo4j运行,
任何建议让我开始非常感谢。



我不熟悉Java,虽然我使用Python和bash shell脚本。
在初始导入后,我将使用带有Javascript的RESTful接口。

解决方案

https://github.com/jexp/batch-import/issues/4 =noreferrer>在git repo 。使用Michael Hunger的批量导入,可以从一个.csv文件导入多个节点类型。
引用Michael:


只要把它们全部放到一个节点文件中,你可以有任何属性不是


因此,我使用的一般方法是:



将所有节点表合并到一个名为 nodes 的表中:



<
  • 使用自动递增 newID 字段和<$ c $创建新表 nodes c> type 字段。类型字段将记录节点数据来自哪个表

  • 从允许null的3个节点表中添加所有可能的列名称。

  • Organism 的值,然后除了将类型字段设置为个人,生物体或故事,code> Story 将任何不相关的字段留空。在新表中添加 rels 添加新的基于sql创建 newID 索引到链接 JOIN

      INSERT INTO rels 
    SELECT
    n1.newID AS fromNodeID,
    n2。 newID AS toNodeID,
    L.LinkType,
    L.ID
    FROM
    链接L
    LEFT JOIN
    节点n1
    ON
    L.fromID = n1.ID
    AND
    L.fromType = n1.type
    LEFT JOIN
    节点n2
    ON
    L.toID = n2 .ID
    AND
    L.toType = n2.type;

    然后导出这两个新表 nodes rels 作为制表符分隔的.csv文件,并在批量导入中使用它们:

      $ java -server -Xmx4G -jar target / batch-import-jar-with-dependencies.jar target / graph.db nodes.csv rels.csv 


    I am considering replacing a MySQL database with a neo4j database. I am a complete beginner with neo4j and would like to know how to go about doing a batch insert of my current MySQL data into the neo4j database so i can experiment and begin to learn about neo4j.

    the relational database consists of 4 tables: Person, Organism, Story, Links. Links describes relationships between rows in the other 3 tables.

    Links: ID, FromTable, FromID, ToTable, ToID, LinkType

    Person: ID, property_2, property_1, etc ...

    Organism: ID, property_A, property_B, etc ....

    Story: ID, property_x, property_y

    each ID field is an auto incrementing integer starting from 1 for each table

    In case it is not obvious, a link between say person with ID 3 and a story with ID 42 would have a row in the Links table ID=autoincrement, FromTable=Person, FromID=3, ToTable=Story, ToID=42. Even though I am using the terms 'from' and 'to' the actual links are not really 'directed' in practice.

    I have looked at Michael Hunger's batch-import but that seems to only work with a single table of nodes and one table of relationships, whereas I am looking to import three different types of nodes and one list of relationships between them.

    I have got neo4j up and running, Any advice to get me started would be greatly appreciated.

    I am not familiar with Java, though I do use Python and bash shell scripts. After initial import, I will be using the RESTful interface with Javascript.

    解决方案

    Based on advice in the git repo. Using Michael Hunger's batch-import it is possible to import multiple node types from the one .csv file. To quote Michael:

    Just put them all into one nodes file, you can have any attribute not having a value in a certain row, it will then just be skipped.

    So the general approach i used was:

    combine all the nodes tables into a new table called nodes:

    1. Create a new table nodes with an auto incrementing newID field and a type field. the type field will record what table the node data came from
    2. Add all the possible columns names from the 3 node tables allowing nulls.
    3. INSERT INTO nodes the values from Person, then Organism, then Story, in addition to setting the type field to person, organism, or story. Leave any unrelated fields blank.

    in another new table rels add the newly created newID indexes to the Links table based on a sql JOIN:

    INSERT INTO rels
    SELECT  
        n1.newID AS fromNodeID, 
        n2.newID AS toNodeID,
        L.LinkType,
        L.ID
    FROM 
        Links L
    LEFT JOIN 
        nodes n1 
        ON 
        L.fromID = n1.ID 
        AND 
        L.fromType = n1.type
    LEFT JOIN 
        nodes n2 
        ON 
        L.toID = n2.ID 
        AND 
        L.toType = n2.type;
    

    Then export these two new tables nodes and rels as Tab seperated .csv files, and use them with batch-import:

    $java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/graph.db nodes.csv rels.csv
    

    这篇关于如何做初始批量导入CSV / MySQL数据到neo4j数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆