Neo4j中性能缓慢的批量更新关系属性 [英] Slow performance bulk updating relationship properties in Neo4j

查看:550
本文介绍了Neo4j中性能缓慢的批量更新关系属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力有效地批量更新Neo4j中的关系属性.目的是更新〜500,000个关系(每个具有大约3个属性),我将它们分成1000个批处理并在单个Cypher语句中进行处理,

I'm struggling to efficiently bulk update relationship properties in Neo4j. The objective is to update ~ 500,000 relationships (each with roughly 3 properties) which I chunk into batches of 1,000 and processing in a single Cypher statement,

UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties

但是,每1000个节点的批处理大约需要60秒. UUID属性上有一个:Entity标签的索引,即我以前运行过

however each batch of 1,000 nodes takes around 60 seconds. There exists an index on UUID property for the :Entity label, i.e. I've previously run,

CREATE INDEX ON :Entity(uuid)

这意味着根据查询计划匹配关系非常有效,

which means that matching the relationship is super efficient per the query plan,

总共有6个数据库命中,查询在约150毫秒内执行.我还在UUID属性上添加了唯一性约束,以确保每个匹配项仅返回一个元素,

There's 6 total db hits and the query executes in ~ 150 ms. I've also added a uniqueness constraint on the UUID property which ensures that each match only returns one element,

CREATE CONSTRAINT ON (n:Entity) ASSERT n.uuid IS UNIQUE

有人知道我如何进一步调试它,以了解为什么Neo4j花费这么长时间来处理这些关系吗?

Does anyone know how I can further debug this to understand why it's taking Neo4j so long to process the relationships?

请注意,我正在使用类似的逻辑来更新节点,这些节点的速度要快几个数量级,并且与它们关联的元数据要多得多.

Note that I'm using similar logic for updating nodes which is orders of magnitude faster which have significant more metadata associated with them.

作为参考,我正在使用Neo4j 3.0.3,py2neo和Bolt. Python代码块的形式为

For reference I'm using Neo4j 3.0.3, py2neo, and Bolt. The Python code block is of the form,

for chunk in chunker(relationships): # 1,000 relationships per chunk
    with graph.begin() as tx:
        statement = """
            UNWIND {rows} AS row
            MATCH (s:Entity) WHERE s.uuid = row.source
            MATCH (t:Entity) WHERE t.uuid = row.target
            MATCH (s)-[r:CONSUMED]->(t)
            SET r += row.properties
            """

            rows = []

            for rel in chunk:
                rows.append({
                    'properties': dict(rel),
                    'source': rel.start_node()['uuid'],
                    'target': rel.end_node()['uuid'],
                })

            tx.run(statement, rows=rows)

推荐答案

尝试以下查询:

UNWIND {rows} AS row
WITH row.source as source, row.target as target, row
MATCH (s:Entity {uuid:source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties;

它使用索引提示强制为两个 Entity节点,然后是

It uses index hints to force an index lookup for both Entity nodes and then an Expand(Into) operator which should be more performant than the Expand(All) and Filter operators shown by your query plan.

这篇关于Neo4j中性能缓慢的批量更新关系属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆