Cypher MATCH 查询速度 [英] Cypher MATCH query speed

查看:24
本文介绍了Cypher MATCH 查询速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在具有 12 个处理器和 64GB 内存的 Windows 机器上安装了 Neo4j.我没有更改 Neo4j 允许的任何内存设置.

I have Neo4j installed on a windows machine with 12 processors and 64GB ram. I did not change any of the memory settings that Neo4j allows for.

我的数据库有 380 万个节点,其中 210,000 个被标记为地理标记,总共有 650,000 个关系.我正在尝试运行以下查询,我想知道这是否是一个非常密集的查询,可能需要很长时间.

My database has 3.8m nodes, 210,000 of which are labeled as Geotagged and a total of 650,000 relationships. I am trying to run the following query and I am wondering if this is a really intensive query that will likely take quite a while.

Messages.csv 是我的关系文件.关系已经创建,但由于我不知道如何将关系创建与下面的距离生成相结合,我加载并运行了两次关系文件.

Messages.csv is my relationship file. The relationships have already been created, but as I could not figure out how to combine the relationship creation with the below Distance generation, I am loading and running through the relationship file twice.

USING PERIODIC COMMIT 15000
LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

初始关系生成大约需要 3-5 分钟.我让上面的程序运行了一个多小时,它仍然没有完成.我在同一个初始数据库上运行了一个类似的算法(虽然它有更多的触发调用),让它运行了 18 多个小时,但仍然没有完成.

The initial relationship generation takes about 3-5 minutes. I let the above run for over an hour and it still was not complete. I ran a similar algorithm (though it had a few more trig calls in it) on the same initial db and let it run for over 18 hours and still had not completed.

我的问题:这是一个非常密集的查询吗?我没有给它足够的时间吗?更重要的是,有没有办法优化它?

My question: Is this a very intensive query? Am I not giving it enough time? And more importantly, is there a way I can optimize this?

我尝试添加WHERE NOT HAS(r.Distance)"以排除算法已经设置了距离的节点对,但我不确定 MATCH 是一次性匹配还是每个匹配CSV 文件中的行?

I tried adding "WHERE NOT HAS(r.Distance)" to exclude node pairs that the algorithm has already set the Distance on, though I am unsure if the MATCH is a one-time match or if it will MATCH for each line in the CSV file?

对此的任何想法将不胜感激.

Any thoughts on this would really be appreciated.

推荐答案

这是对 Brian 回复的补充:

This is additional to Brian's reply:

您语句的查询计划显示EAGER,以验证运行

Your statement's query plan shows EAGER, to verify run

EXPLAIN explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }) - [r:MSGED] -> (b:Geotagged { username: line.recipient })
SET r.Distance = (2 * 6371 *asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

LOAD CSV 中的急切非常糟糕,请参阅这些博客文章为什么:

Eagerness in LOAD CSV is pretty bad, see the these blog posts why:

按照 Mark 的建议并将 MATCH/SET 替换为 MERGE ON MATCH SET,我们可以将其重构为:

Following Mark's suggested and replacing the MATCH/SET with a MERGE ON MATCH SET we can refactor that into:

explain LOAD CSV WITH HEADERS FROM "file:d:/messages.csv" AS line
WITH line LIMIT 100
MATCH (a:Geotagged { username: line.sender }), (b:Geotagged { username: line.recipient })
MERGE (a)-[r:MSGED]->(b)
ON MATCH SET r.Distance = (2 * 6371 * asin(sqrt(haversin(radians(toFloat(b.statusLat) - toFloat(a.statusLat))) + cos(radians(toFloat(b.statusLat))) * cos(radians(toFloat(a.statusLat))) * haversin(radians(toFloat(b.statusLon) - toFloat(a.statusLon))))));

而且eager 已经消失了.

这篇关于Cypher MATCH 查询速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆