使用py2neo WriteBatch将大图数据插入Neo4j [英] Inserting large graph data into Neo4j using py2neo WriteBatch

查看:592
本文介绍了使用py2neo WriteBatch将大图数据插入Neo4j的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由以下文件表示的图形:

I have a graph represented by the following files:

  • VertexLabel.txt->每行包含每个顶点的属性.
  • EdgeLabel.txt->每行包含每个边缘的属性.
  • EdgeID.txt->每行包含3个独立的整数,它们对应于标签文件中的索引: source_index target_index edge_index.
  • VertexLabel.txt -> each line contains properties for each vertex.
  • EdgeLabel.txt -> each line contains properties for each edge.
  • EdgeID.txt -> each line contains 3 separated integers which correspond to indexes in the label files: source_index target_index edge_index.

大约有44K个顶点,其边缘为240K.我正在尝试使用 neo4j.Writebatch 批量插入图形数据.

There are roughly 44K vertices with 240K edges. I'm trying to use neo4j.Writebatch to batch insert the graph data.

from py2neo import Graph, neo4j, node, rel

graph_db = Graph()
nodes = {}
batchNodes = {}
edges = {}
edgeList = []

# Read vertex label file into nodes, where node[i] is indexed according to the order the nodes appear in the file.
# Each entry is of type node, e.g. node("FILM", title = "Star Trek"), node("CAST", name = "William Shatner")
...  

# Read edge label file into edges, where edges[i] is indexed according to the order the edges appear in the file.
# Each entry is a tuple (edge_type, edge_task), e.g. ("STAFF", "Director")
...  

# Read edge id file into edgeList
# Each entry is the tuple (source_index, target_index, edge_index), e.g. (1, 4, 8)
...  

# Iterate nodes, store in graph
# Note, store result of batch.create into batchNodes
batch = neo4j.WriteBatch(graph_db)
count = 0
for n in nodes:
    batchNodes[n] = batch.create(nodes[n])
    count += 1

    # Submit every 500 steps
    if count % 500 == 0:
        count = 0
        batch.submit()
        batch = neo4j.WriteBatch(graph_db)

# Submit remaining batch
batch.submit()

# Iterate edgeList, store in graph
batch = neo4j.WriteBatch(graph_db)
count = 0
for i, j, k in edgeList:
    # Lookup reference in batchNodes
    source = batchNodes[i]
    target = batchNodes[j]
    edge = edges[k]
    batch.create(rel(source, edge[0], target, {"task": edge[1]}))
    count += 1

    # Submit every 500 steps
    if count % 500 == 0:
        count = 0
        batch.submit()
        batch = neo4j.WriteBatch(graph_db)

# Submit remaining batch
batch.submit()

我收到以下错误:

Traceback (most recent call last):   File "test4.py", line 87, in <module>
    batch.create(rel(source, edge[0], target, {"task": edge[1]}))   File "C:\Python34\lib\site-packages\py2neo\batch\write.py", line 181, in create
    start_node = self.resolve(entity.start_node)   File "C:\Python34\lib\site-packages\py2neo\batch\core.py", line 374, in resolve
    return NodePointer(self.find(node))   File "C:\Python34\lib\site-packages\py2neo\batch\core.py", line 394, in find
    raise ValueError("Job not found in batch") ValueError: Job not found in batch

我假设batchNodes实际上没有包含对要查找以添加关系的节点的正确引用(也许重新初始化批处理对象会使引用无效).在这种情况下,我应该如何执行此任务?

I presume that batchNodes is not actually containing the proper reference to the nodes which I want to lookup for adding relationships (perhaps reinitializing the batch object invalidates the references). In this case, how should I perform this task?

我正在使用Neo4j 2.1.7(社区版)和py2neo 2.0.4.

I am using Neo4j 2.1.7 (Community Edition) and py2neo 2.0.4.

推荐答案

自Neo4j 2.1 LOAD CSV以来,我建议您导入类似CSV的数据

For importing your CSV like data I'd recommend since Neo4j 2.1 LOAD CSV

load csv with headers from "file://...VertexLabel.txt" as row
where has(row.name)
create (:Actor {row.name})

类似地,您可以加载关系

similarly you can load your relationships

在:Actor(name);上创建索引; 在:Movie(title);

create index on :Actor(name); create index on :Movie(title);

load csv with headers from "file://...EdgeID.txt" as row
match (a:Actor {row.name})
match (m:Movie {row.title})
create (a)-[:ACTED_IN]->(m)

从Neo4j 2.2开始,您还可以使用neo4j-import一个超级快速的工具来导入csv数据,该数据还支持id组,并在csv中提供标签和类型等.

since Neo4j 2.2 you can also use neo4j-import a super fast tool to import csv data which also supports id-groups, providing labels and types in the csv etc.

请参阅: http://neo4j.com/developer/guide-importing -data-and-etl/ 和: http://neo4j.com/developer/guide-import-csv/

这篇关于使用py2neo WriteBatch将大图数据插入Neo4j的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆