使用py2neo将数据从XML加载到neo4j [英] Loading data to neo4j from XML using py2neo

查看:334
本文介绍了使用py2neo将数据从XML加载到neo4j的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用py2neo将数据从xml文件加载到neo4j db

Im trying to load data to neo4j db from xml file using py2neo

此python脚本运行正常,但速度太慢,因为我先添加了节点,然后再添加了带有两个异常处理程序的关系.此外,XML文件大小约为200MB.

this python script works fine but its too slow since Im adding the nodes first then the relationships with two exceptions handlers. besides that the XML file size is around 200MB.

我想知道是否有更快的方法来执行此任务?

Im wondering if there is faster way to perform this task?

XML文件:

<Persons>
    <person>
        <id>XA123</id>
        <first_name>Adam</first_name>
        <last_name>John</last_name>
        <phone>01-12322222</phone>
    </person>
    <person>
        <id>XA7777</id>
        <first_name>Anna</first_name>
        <last_name>Watson</last_name>
        <relationship>
            <type>Friends</type>
            <to>XA123</to>
        </relationship>
    </person>
</Persons>

python脚本:

#!/usr/bin/python3

from xml.dom import minidom
from py2neo import Graph, Node, Relationship, authenticate


graph = Graph("http://localhost:7474/db/data/")
authenticate("localhost:7474", "neo4j", "admin")

xml_file = open("data.xml")
xml_doc = minidom.parse(xml_file)
persons = xml_doc.getElementsByTagName('person')

# Adding Nodes
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data
    fName = person.getElementsByTagName('first_name')[0].firstChild.data
    lName = person.getElementsByTagName('last_name')[0].firstChild.data

    # not every person has phone number
    try:
        phone = person.getElementsByTagName('phone')[0].firstChild.data
    except IndexError:
        phone = "None"

    label = "Person"
    node = Node(label, ID=ID_, LastName=fName, FirstName=lName, Phone=phone)
    graph.create(node)


# Adding Relationships
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data

    label = "Person"
    node1 = graph.find_one(label, property_key="ID", property_value=ID_)

    # relationships
    try:
        has_relations = person.getElementsByTagName('relationship')
        for relation in has_relations:
            node2 = graph.find_one(label,
                                   property_key="ID",
                                   property_value=relation.getElementsByTagName('to')[0].firstChild.data)

            relationship = Relationship(node1,
                                        relation.getElementsByTagName('type')[0].firstChild.data, node2)
            graph.create(relationship)
    except IndexError:
        continue

推荐答案

通过为特定标签使用唯一的属性约束,可以大大减少将数据加载到neo4j中所需的时间.

the time needed to load the data into neo4j has significantly reduced by using unique property constraints for a specific label.

graph.cypher.execute("CREATE CONSTRAINT ON (n:Person) ASSERT n.ID IS UNIQUE")

这篇关于使用py2neo将数据从XML加载到neo4j的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆