Cypher MATCH查询速度 [英] Cypher MATCH query speed
问题描述
我在装有12个处理器和64GB内存的Windows机器上安装了Neo4j.我没有更改Neo4j允许的任何内存设置.
I have Neo4j installed on a windows machine with 12 processors and 64GB ram. I did not change any of the memory settings that Neo4j allows for.
我的数据库有380万个节点,其中210,000个被标记为Geotagged,共有650,000个关系.我正在尝试运行以下查询,并且想知道这是否是一个非常密集的查询,可能会花费相当长的时间.
My database has 3.8m nodes, 210,000 of which are labeled as Geotagged and a total of 650,000 relationships. I am trying to run the following query and I am wondering if this is a really intensive query that will likely take quite a while.
Messages.csv是我的关系文件.关系已经创建,但是由于我不知道如何将关系创建与下面的距离生成"结合使用,因此我两次加载并运行关系文件.
Messages.csv is my relationship file. The relationships have already been created, but as I could not figure out how to combine the relationship creation with the below Distance generation, I am loading and running through the relationship file twice.
USING PERIODIC COMMIT 15000
LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));
初始关系生成大约需要3-5分钟.我让以上内容运行了一个多小时,但仍未完成.我在同一个初始db上运行了一个类似的算法(尽管其中有更多的trig调用),并使其运行了18个小时以上,但仍未完成.
The initial relationship generation takes about 3-5 minutes. I let the above run for over an hour and it still was not complete. I ran a similar algorithm (though it had a few more trig calls in it) on the same initial db and let it run for over 18 hours and still had not completed.
我的问题:这是一个非常密集的查询吗?我没有给它足够的时间吗?而且更重要的是,有什么方法可以优化这一点吗?
My question: Is this a very intensive query? Am I not giving it enough time? And more importantly, is there a way I can optimize this?
我尝试添加"WHERE NOT HAS(r.Distance)"以排除该算法已设置距离"的节点对,尽管我不确定MATCH是否为一次性匹配,或者是否会为每个匹配CSV文件中的第几行?
I tried adding "WHERE NOT HAS(r.Distance)" to exclude node pairs that the algorithm has already set the Distance on, though I am unsure if the MATCH is a one-time match or if it will MATCH for each line in the CSV file?
任何对此的想法都将不胜感激.
Any thoughts on this would really be appreciated.
推荐答案
这是Brian的回复的补充内容:
This is additional to Brian's reply:
您的语句的查询计划显示EAGER
,以验证运行
Your statement's query plan shows EAGER
, to verify run
EXPLAIN explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 *asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));
LOAD CSV
中的渴望非常糟糕,请参阅以下博客文章原因:
Eagerness in LOAD CSV
is pretty bad, see the these blog posts why:
- http://www. markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
- http://jexp.de/blog/2014/10/load-cvs-with-success/
- http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
- http://jexp.de/blog/2014/10/load-cvs-with-success/
按照Mark的建议,将MATCH/SET
替换为MERGE ON MATCH SET
,我们可以将其重构为:
Following Mark's suggested and replacing the MATCH/SET
with a MERGE ON MATCH SET
we can refactor that into:
explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }), (b:Geotagged { username: line.recipient })
MERGE (a)-[r:MSGED]->(b)
ON MATCH SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));
eager
消失了.
这篇关于Cypher MATCH查询速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!