Cypher MATCH查询速度 [英] Cypher MATCH query speed

查看:121
本文介绍了Cypher MATCH查询速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在装有12个处理器和64GB内存的Windows机器上安装了Neo4j.我没有更改Neo4j允许的任何内存设置.

I have Neo4j installed on a windows machine with 12 processors and 64GB ram. I did not change any of the memory settings that Neo4j allows for.

我的数据库有380万个节点,其中210,000个被标记为Geotagged,共有650,000个关系.我正在尝试运行以下查询,并且想知道这是否是一个非常密集的查询,可能会花费相当长的时间.

My database has 3.8m nodes, 210,000 of which are labeled as Geotagged and a total of 650,000 relationships. I am trying to run the following query and I am wondering if this is a really intensive query that will likely take quite a while.

Messages.csv是我的关系文件.关系已经创建,但是由于我不知道如何将关系创建与下面的距离生成"结合使用,因此我两次加载并运行关系文件.

Messages.csv is my relationship file. The relationships have already been created, but as I could not figure out how to combine the relationship creation with the below Distance generation, I am loading and running through the relationship file twice.

USING PERIODIC COMMIT 15000
LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

初始关系生成大约需要3-5分钟.我让以上内容运行了一个多小时,但仍未完成.我在同一个初始db上运行了一个类似的算法(尽管其中有更多的trig调用),并使其运行了18个小时以上,但仍未完成.

The initial relationship generation takes about 3-5 minutes. I let the above run for over an hour and it still was not complete. I ran a similar algorithm (though it had a few more trig calls in it) on the same initial db and let it run for over 18 hours and still had not completed.

我的问题:这是一个非常密集的查询吗?我没有给它足够的时间吗?而且更重要的是,有什么方法可以优化这一点吗?

My question: Is this a very intensive query? Am I not giving it enough time? And more importantly, is there a way I can optimize this?

我尝试添加"WHERE NOT HAS(r.Distance)"以排除该算法已设置距离"的节点对,尽管我不确定MATCH是否为一次性匹配,或者是否会为每个匹配CSV文件中的第几行?

I tried adding "WHERE NOT HAS(r.Distance)" to exclude node pairs that the algorithm has already set the Distance on, though I am unsure if the MATCH is a one-time match or if it will MATCH for each line in the CSV file?

任何对此的想法都将不胜感激.

Any thoughts on this would really be appreciated.

推荐答案

这是Brian的回复的补充内容:

This is additional to Brian's reply:

您的语句的查询计划显示EAGER,以验证运行

Your statement's query plan shows EAGER, to verify run

EXPLAIN explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 *asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

LOAD CSV中的渴望非常糟糕,请参阅以下博客文章原因:

Eagerness in LOAD CSV is pretty bad, see the these blog posts why:

  • http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
  • http://jexp.de/blog/2014/10/load-cvs-with-success/

按照Mark的建议,将MATCH/SET替换为MERGE ON MATCH SET,我们可以将其重构为:

Following Mark's suggested and replacing the MATCH/SET with a MERGE ON MATCH SET we can refactor that into:

explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }), (b:Geotagged { username: line.recipient })
MERGE (a)-[r:MSGED]->(b)
ON MATCH SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

eager消失了.

这篇关于Cypher MATCH查询速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆