Neo4j中性能缓慢的批量更新关系属性 [英] Slow performance bulk updating relationship properties in Neo4j
问题描述
我正在努力有效地批量更新Neo4j中的关系属性.目的是更新〜500,000个关系(每个具有大约3个属性),我将它们分成1000个批处理并在单个Cypher语句中进行处理,
I'm struggling to efficiently bulk update relationship properties in Neo4j. The objective is to update ~ 500,000 relationships (each with roughly 3 properties) which I chunk into batches of 1,000 and processing in a single Cypher statement,
UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties
但是,每1000个节点的批处理大约需要60秒. UUID属性上有一个:Entity
标签的索引,即我以前运行过
however each batch of 1,000 nodes takes around 60 seconds. There exists an index on UUID property for the :Entity
label, i.e. I've previously run,
CREATE INDEX ON :Entity(uuid)
这意味着根据查询计划匹配关系非常有效,
which means that matching the relationship is super efficient per the query plan,
总共有6个数据库命中,查询在约150毫秒内执行.我还在UUID属性上添加了唯一性约束,以确保每个匹配项仅返回一个元素,
There's 6 total db hits and the query executes in ~ 150 ms. I've also added a uniqueness constraint on the UUID property which ensures that each match only returns one element,
CREATE CONSTRAINT ON (n:Entity) ASSERT n.uuid IS UNIQUE
有人知道我如何进一步调试它,以了解为什么Neo4j花费这么长时间来处理这些关系吗?
Does anyone know how I can further debug this to understand why it's taking Neo4j so long to process the relationships?
请注意,我正在使用类似的逻辑来更新节点,这些节点的速度要快几个数量级,并且与它们关联的元数据要多得多.
Note that I'm using similar logic for updating nodes which is orders of magnitude faster which have significant more metadata associated with them.
作为参考,我正在使用Neo4j 3.0.3,py2neo和Bolt. Python代码块的形式为
For reference I'm using Neo4j 3.0.3, py2neo, and Bolt. The Python code block is of the form,
for chunk in chunker(relationships): # 1,000 relationships per chunk
with graph.begin() as tx:
statement = """
UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties
"""
rows = []
for rel in chunk:
rows.append({
'properties': dict(rel),
'source': rel.start_node()['uuid'],
'target': rel.end_node()['uuid'],
})
tx.run(statement, rows=rows)
推荐答案
尝试以下查询:
UNWIND {rows} AS row
WITH row.source as source, row.target as target, row
MATCH (s:Entity {uuid:source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties;
它使用索引提示强制为两个 Entity
节点,然后是查询计划显示的Expand(All)
和Filter
运算符.
It uses index hints to force an index lookup for both Entity
nodes and then an Expand(Into)
operator which should be more performant than the Expand(All)
and Filter
operators shown by your query plan.
这篇关于Neo4j中性能缓慢的批量更新关系属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!