Neo4j:使用 Cypher 批量关联节点的最佳方法? [英] Neo4j: Best way to batch relate nodes using Cypher?

查看:60
本文介绍了Neo4j:使用 Cypher 批量关联节点的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行一个脚本试图批量合并所有节点时,我得到了一些奇怪的性能结果.

When I run a script that tries to batch merge all nodes a certain types, I am getting some weird performance results.

当合并 2 个节点集合 (~42k) 和 (~26k) 时,性能又好又快.但是当我合并 (~42) 和 (5) 时,性能会急剧下降.我正在对 ParentNodes 进行批处理(所以 (~42k) 分成 500 个批次.为什么当我基本上合并较少的节点时性能会下降(当批处理集相同,但源批处理集高而目标集低)?

When merging 2 collections of nodes (~42k) and (~26k), the performance is nice and fast. But when I merge (~42) and (5), performance DRAMATICALLY degrades. I'm batching the ParentNodes (so (~42k) split up in batches of 500. Why does performance drop when I'm, essentially, merging less nodes (when the batch set is the same, but the source of the batch set is high and the target set is low)?

MATCH (s:ContactPlayer)   
WHERE  has(s.ContactPrefixTypeId)    
WITH  collect(s) AS allP   
WITH  allP[7000..7500] as rangedP   
FOREACH  (parent in rangedP  |  
    MERGE (child:ContactPrefixType 
            {ContactPrefixTypeId:parent.ContactPrefixTypeId}
          )  
    MERGE (child)-[r:CONTACTPLAYER]->(parent)  
    SET r.ContactPlayerId = parent.ContactPlayerId ,      
        r.ContactPrefixTypeId = child.ContactPrefixTypeId  )

性能结果:

进程启动

开始插入联系人项目[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

Starting to insert Contact items [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++]

  • 42149 个联系人的总时间:19176.87 毫秒
  • 每批次的平均时间 (500):213.4 毫秒
  • 最长的批处理时间:663 毫秒

开始插入 ContactPlayer 项目[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

Starting to insert ContactPlayer items [++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

  • 27970 个 ContactPlayer 项目的总时间:9419.2106 毫秒
  • 每批次的平均时间 (500):167.75 毫秒
  • 最长批处理时间:689 毫秒

开始将 Contact 关联到 ContactPlayer[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

Starting to relate Contact to ContactPlayer [++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

  • 将 Contact 关联到 ContactPlayer 所用的总时间:7907.4877ms
  • 每批次的平均时间 (500):141.151517857143 毫秒
  • 最长批次时间:883.0918ms 批次号:0

开始插入 ContactPrefixType 项目
[+]

Starting to insert ContactPrefixType items
[+]

  • 5 个 ContactPrefixType 项目的总时间:22.0737 毫秒
  • 每批次的平均时间 (500):22 毫秒
  • 最长批处理时间:22 毫秒

已插入联系人数据.

开始将 ContactPrefixType 关联到 Contact[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]

Starting to relate ContactPrefixType to Contact [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++]

  • 将 ContactPrefixType 关联到 Contact 所用的总时间:376540.8309ms
  • 每批次的平均时间 (500):4429.78643647059 毫秒
  • 最长批次时间:14263.1843 毫秒,批次号:63

推荐答案

您能否将 id 作为参数传入而不是从图中获取它们?查询可能看起来像

Can you pass the ids in as parameters rather than fetch them from the graph? The query could look like

MATCH (s:ContactPlayer {ContactPrefixTypeId:{cptid})
MERGE (c:ContactPrefixType {ContactPrefixTypeId:{cptid})
MERGE c-[:CONTACT_PLAYER]->s

如果你使用 REST API Cypher 资源,我认为实体应该看起来像

If you use the REST API Cypher resource, I think the entity should look something like

{
    "query":...,
    "params": {
        "cptid":id1
    }
}

如果您使用事务端点,它应该看起来像这样.您可以通过每次调用中的语句数量以及提交之前的调用数量来控制事务大小.更多此处.

If you use the transactional endpoint, it should look something like this. You control transaction size by the number of statements in each call, and also by the number of calls before you commit. More here.

{
    "statements":[
        "statement":...,
        "parameters": {
            "cptid":id1
        },
        "statement":...,
        "parameters": {
            "cptid":id2
        }
    ]
}

这篇关于Neo4j:使用 Cypher 批量关联节点的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆