通过许多可选关系优化Cypher查询 [英] Optimizing Cypher queries with lots of optional relationships

查看:76
本文介绍了通过许多可选关系优化Cypher查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过Neo4j 2.0.1在批处理REST API上使用Cypher.

I'm using Cypher over the batch REST API with Neo4j 2.0.1.

我正在尝试优化具有很多可选关系的查询.我想一次取回所有数据,以限制我必须进行的往返数据库的次数.即使我的数据库中只有大约12000个节点,查询已经开始爬网(有些查询花费了1.5秒以上的时间才能返回1000个节点).

I'm trying to optimize my queries which have lots of optional relationships. I'd like to retrieve all of the data in one shot to limit the number of round trips I have to make to the database. Even though I only have about 12000 nodes in my database, the queries are already starting to crawl (some are taking over 1.5 seconds to return 1000 nodes).

我已经设置了一个图表要点,其详细信息位于 http://gist.neo4j.org /?9494429e3cbbbeda2b11 .

I've set up a graph gist that goes into more detail at http://gist.neo4j.org/?9494429e3cbbbeda2b11.

我的查询通常都采用以下形式:

My queries all generally take the following form:

MATCH (u:user { id: "u1" })
WITH u

MATCH u-[:CONTACT]->(c:contact)
WITH u, c

OPTIONAL MATCH  (c)-[:CREATED]->(xca:activity)<-[:USERACTIVITY]-(xcc:contact)
OPTIONAL MATCH  (c)-[:HISTORY]->(xcu:activity)<-[:USERACTIVITY]-(xuc:contact)
OPTIONAL MATCH (c)-[:PHONE]->(xp:phone)
OPTIONAL MATCH (c)-[:ADDRESS]->(xa:address)
OPTIONAL MATCH (u)-[:PHONE]->(xup:phone)
OPTIONAL MATCH (u)-[:ADDRESS]->(xua:address)
WITH DISTINCT c AS x, u,
    COLLECT(DISTINCT xp) AS xps,
    COLLECT(DISTINCT xa) AS xas,
    COLLECT(DISTINCT xup) AS xups,
    COLLECT(DISTINCT xua) AS xuas,
    xca.createdat AS createdat,
    xcu.createdat AS updatedat,
    {id: xcc.id} AS createdby,
    {id: xuc.id} AS updatedby
RETURN COLLECT({
    id: x.id,
    name:  COALESCE(u.name, x.name), 
    createdat: createdat,
    createdby: createdby,
    updatedat: updatedat,
    updatedby: updatedby,
        phones: (CASE WHEN size(xps)= 0
          THEN NULL
          ELSE [xp IN xps | { id: xp.id, number: xp.number}]
          END),
        userphones: (CASE WHEN size(xups)= 0
          THEN NULL
          ELSE [xup IN xups | { id: xup.id, number: xup.number }]
          END),
        addresses: (CASE WHEN size(xas)= 0
          THEN NULL
          ELSE [xa IN xas | { id: xa.id, city: xa.city}]
          END),
        useraddresses: (CASE WHEN size(xuas)= 0
          THEN NULL
          ELSE [xua IN xuas | { id: xua.id, city: xua.city}]
          END)
}) AS r

是否有更好的方法来查询具有很多可选关系的节点?在这种情况下,我应该期待什么样的表现?谢谢!

Is there a better way to query for nodes that have lots of optional relationships? What type of performance should I be expecting in cases like these? Thanks!

我已经根据Michael的建议重写了查询,并确认每一步的基数均为1(也就是说,查询的每一步仅返回一行).当我只返回一个联系人时,查询大约需要400毫秒(比原始查询大约差5倍).

I've rewritten the query based on Michael's suggestion and verified that each step has a cardinality of 1 (that is, only one row is returned for each step of the query). When I return just a single contact the query takes about 400ms (about 5x worse than the original query).

但是,当我尝试对返回1000个联系人的较大数据集运行查询时,该查询现在只是挂起,从未完成,因此我必须重新启动Neo4j服务器.我没有尝试将结果合并到新的地图中,但是我认为这不会解决问题.我现在是否正在创建更差的交叉产品,但在逐步测试查询时却没有显示该产品?

However, when I try and run the query against my larger dataset that returns 1000 contacts it now just hangs, never completes, and I have to restart the Neo4j server. I didn't try and combine the results into a new map, but I don't think that is going to solve the problem. Am I creating a worse cross product now that doesn't show up when I test the query step by step?

  MATCH (u:user { id: "123" })
  WITH u

  MATCH (u)-[:CONTACT]->(c:contact)
  WITH c

  OPTIONAL MATCH
    (c)-[:CREATED]->(xca:activity)-[:USERACTIVITY*1..4]<-(xcc:contact),
    (c)-[:HISTORY]->(xcu:activity)-[:USERACTIVITY*1..4]<-(xuc:contact)
  WITH c AS x,
    xca.createdat AS createdat, xcu.createdat AS updatedat,
    {id: xcc.id, object: xcc.object} AS createdby,
    {id: xuc.id, object: xuc.object} AS updatedby

  OPTIONAL MATCH
    (x)-[:PHONE]->(xp:phone)
  WITH x, createdat, updatedat, createdby, updatedby,
    COLLECT(xp) as xps

  OPTIONAL MATCH  
    (x)-[:ADDRESS]->(xa:address)
  WITH x, createdat, updatedat, createdby, updatedby, xps,
    COLLECT(xa) as xas

  OPTIONAL MATCH (xu:user)-[:CONTACT]->(x)
  OPTIONAL MATCH (xu)-[:PHONE]->(xup:phone)
  WITH x, createdat, updatedat, createdby, updatedby, xps, xas,
    xu, COLLECT(xup) as xups

  OPTIONAL MATCH (xu)-[:ADDRESS]->(xua:address)
  WITH x, createdat, updatedat, createdby, updatedby, xps, xas,
    xu, xups, COLLECT(xua) as xuas

  RETURN COLLECT({
    id: x.id,
    object: x.object,
    status: x.status,
    teamid: x.teamid,
    name:  COALESCE(xu.name, x.name),
    displayname: COALESCE(xu.displayname, x.displayname),
    email: COALESCE(xu.email, x.email),
    imageurl: COALESCE(xu.imageurl, x.imageurl),
    workhours: x.workhours,
    notes: x.notes,
    company: x.company,
    createdat: createdat,
    createdby: createdby,
    updatedat: updatedat,
    updatedby: updatedby,
    isuser: (NOT xu IS NULL),
    phones: (CASE WHEN size(xps)= 0
      THEN NULL
      ELSE [xp IN xps | { id: xp.id, object: xp.object,
            number: xp.number, description: xp.description }]
      END),
    userphones: (CASE WHEN size(xups)= 0
      THEN NULL
      ELSE [xup IN xups | { id: xup.id, object: xup.object,
            number: xup.number, description: xup.description }]
      END),
    addresses: (CASE WHEN size(xas)= 0
      THEN NULL
      ELSE [xa IN xas | { id: xa.id, object: xa.object,
            street: xa.street, locality: xa.locality, region: xa.region,
            postcode: xa.postcode, country: xa.country, description: xa.description, neighborhood: xa.neighborhood }]
      END),
    useraddresses: (CASE WHEN size(xuas)= 0
      THEN NULL
      ELSE [xua IN xuas | { id: xua.id, object: xua.object,
            street: xua.street, locality: xua.locality, region: xua.region,
            postcode: xua.postcode, country: xua.country, description: xua.description, neighborhood: xua.neighborhood }]
      END)
  }) AS r

更新2

我尝试移动电话和地址,但是没有任何效果,我什至把它们拿出来,仍然看到类似的结果(1000个联系人超过2秒).我已经消除了查询中的所有复杂性,只是为了了解基线是什么.以下查询平均需要385ms来执行:

UPDATE 2

I tried moving the phones and addresses around but it had no effect, I even took them out and still saw similar results (over 2s for 1000 contacts). I've eliminated all of the complexity in the query just to see what a baseline would be. The following query takes 385ms on average to execute:

MATCH (t:team {id:"123"})
WITH t

MATCH (c:contact)-[:CONTACT]->(t)
WITH c AS x

RETURN COLLECT({
    id: x.id,
    object: x.object,
    status: x.status,
    teamid: x.teamid,
    name:  x.name,
    displayname: x.displayname,
    email: x.email,
    imageurl: x.imageurl,
    workhours: x.workhours,
    notes: x.notes,
    company: x.company
}) AS r

我的数据库有6000个节点和12000个关系,该查询返回1000个联系人(整个数据库的大小为7 MB).这种类型的查询大约需要400毫秒吗?

My database has 6000 nodes and 12000 relationships and this query returns 1000 contacts (the entire db is 7 MB in size). Is almost 400ms for this type of query expected?

我非常感谢您提供查看我的数据库的信息,但是我想我真的很想知道如何自己诊断这些问题.使用Web UI时,看不到爆炸(每个结果仅返回1行).当我使用PROFILE命令时,没有看到像您期望的那样有数百万的数字.

I really appreciate the offer to look at my DB, but I guess I really want to know how to diagnose these problems myself. When I use the web UI, I see no explosion (only 1 row per result being returned). When I use the PROFILE command, I see no numbers in the millions like you expect.

是否有其他工具可用于诊断性能问题?是否有某种调试器可以找出问题所在?

Are there other tools available to diagnose performance issues? Is there a debugger of some kind to track down the problem?

推荐答案

问题是您在所有匹配项之间创建叉积.

The issue is that you create cross products between all of your matches.

如果您可以确定连接最多为一个的匹配项,则可以提前将其拉出.否则,您可以收集匹配的信息以返回您的基数1(或联系人数量ftm).

If you can identify the matches that have at most one connection you can pull them upfront. Otherwise you can collect the matched information to get back to your cardinality of 1 (or # of contacts ftm).

例如

MATCH (u:user { id: "u1" })
OPTIONAL MATCH (u)-[:PHONE]->(xup:phone)
OPTIONAL MATCH (u)-[:ADDRESS]->(xua:address)
// cardinality 1
WITH u, collect(distinct xup) as phones, collect(distinct xua) as addresses
MATCH (u)-[:CONTACT]->(c:contact)
WITH u, c, phones, addresses
OPTIONAL MATCH (c)-[:CREATED]->(xca:activity)<-[:USERACTIVITY]-(xcc:contact)
WITH u,c, phones,addresses, collect(distinct xcc) as contact_activities
...

您已经使用过地图文字,因此您也可以将它们与我建议的内容结合起来,方法是将钥匙逐步添加到地图(或集合)中

You already used map literals so you can also combine them with what I suggested by incrementally adding keys to the map (or a collection)

例如

MATCH (u:user { id: "u1" })
OPTIONAL MATCH (u)-[:PHONE]->(xup:phone)
OPTIONAL MATCH (u)-[:ADDRESS]->(xua:address)
// cardinality 1
WITH u, {user:u, phones:collect(distinct xup), addresses: collect(distinct xua)} as user_info
MATCH (u)-[:CONTACT]->(c:contact)
WITH c, user_info
OPTIONAL MATCH (c)-[:CREATED]->(xca:activity)<-[:USERACTIVITY]-(xcc:contact)
WITH c, user_info, {activities: collect(distinct xcc)} as contact_info
...

DISTINCT和聚合也很可能对您没有帮助.聚合已经为分组密钥创建了不同的条目.

Also DISTINCT and aggregation will very probably not help you. Aggregation is already creating distinct entries for the grouping key.

我试图改编您的图形学家(感谢提供btw)以显示其外观(但我没有完全了解):

I tried to adapt your graphgist (thanks for providing that btw) to show how it could look like (but I didn't go fully through): http://gist.neo4j.org/?bba019835045ed352925

您可能对此图形学家也有兴趣:使用密码的复杂查询结果投影

You might be also interested in this graphgist: A complex query result projection in cypher

这篇关于通过许多可选关系优化Cypher查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆