Neo4j/Cypher有效分页,超大子图有序 [英] Neo4j/Cypher effective pagination with order by over large sub-graph

查看:123
本文介绍了Neo4j/Cypher有效分页,超大子图有序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在(:User)个节点之间有以下简单关系.

(:User)-[:FOLLOWS {timestamp}]->(:User)

如果我按FOLLOWS.timestamp的顺序对关注者进行分页,那么当有人拥有数百万个关注者时,我就​​会遇到性能问题.

MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100

建议在订购时对大数据集进行分页的建议方法是什么?

更新

follower             timestamp
---------------------------------------
id(1000000)          1455967905
id(999999)           1455967875
id(999998)           1455967234
id(999997)           1455967123
id(999996)           1455965321
id(999995)           1455964123
id(999994)           1455963645
id(999993)           1455963512
id(999992)           1455961343
....
id(2)                1455909382
id(1)                1455908432

我想使用在:FOLLOWS关系上设置的时间戳将列表切成薄片.如果我想返回4个追随者的批次,我会先获取当前时间戳,然后返回4个最新时间戳,然后返回1455967123和4个最新时间戳,依此类推.为此,整个列表应按时间戳排序,这会导致数百万条记录的性能问题.

解决方案

如果您正在寻找最新的关注者,即时间戳大于给定时间,则只需遍历最新的关注者. /p>

您可以在 20ms

内使用(2)完成此操作

如果您确实在寻找最早的(第一)关注者,则可以跳过而不必查看百万关注者的时间戳(在我的系统上大约为1s,请参阅(3)).如果您提前跳过,则时间将缩短为 230ms ,请参阅(1)

通常,我们可以在笔记本电脑上看到每个内核每秒进行2M db操作.

(1)查看最早/最早的关注者

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f                  | follower   |
+---------------------------------+
| :FOLLOWS[0]{ts:1}  | Node[1]{}  |
...
+---------------------------------+
10 rows
243 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator        | Estimated Rows | Rows    | DB Hits | Identifiers                                                             | Other                                 |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults |              1 |      10 |       0 | f, follower                                                             | f, follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158]                  |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top            |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | Literal(10);                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |     499 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | anon[155]; anon[158]; anon[155].ts    |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |       0 | anon[142], anon[155], anon[158], f, follower, u                         | f; follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter         |              1 |     499 |       0 | anon[142], f, follower, u                                               | anon[142]                             |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |    1000 |    1000 | anon[142], f, follower, u                                               | f; follower; f.ts < {  AUTOINT2}      |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip           |              1 |    1000 |       0 | f, follower, u                                                          | {  AUTOINT1}                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All)    |              1 | 1000000 | 1000001 | f, follower, u                                                          | (u)<-[  f@12:FOLLOWS]-(  follower@24) |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek   |              1 |       1 |       1 | u                                                                       |                                       |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+

Total database accesses: 1001501

(2)查看最近的关注者

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f                           | follower       |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows  | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |    10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |    10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |    10 |   16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > {  AUTOINT1}) |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 16394 |   16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |     1 |       1 | u              |                                                               |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 32790

(3)寻找最老的追随者而无需跳过

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f                     | follower    |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows   | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |     10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |     10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |     10 |  999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < {  AUTOINT1}) |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 999498 |  999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |      1 |       1 | u              |                                                               |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 1998998

I have following simple relationship between (:User) nodes.

(:User)-[:FOLLOWS {timestamp}]->(:User)

If I paginate followers ordered by FOLLOWS.timestamp I'm running into performance problems when someone has millions of followers.

MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100

What is suggested approach for paginating big sets of data when ordering is required?

UPDATE

follower             timestamp
---------------------------------------
id(1000000)          1455967905
id(999999)           1455967875
id(999998)           1455967234
id(999997)           1455967123
id(999996)           1455965321
id(999995)           1455964123
id(999994)           1455963645
id(999993)           1455963512
id(999992)           1455961343
....
id(2)                1455909382
id(1)                1455908432

I want to slice this list down using timestamp which set on :FOLLOWS relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.

解决方案

If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.

You can do it with (2) in 20ms

If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)

In general we can see that on my laptop it does 2M db-operations per core and second.

(1) Look at first / oldest followers

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f                  | follower   |
+---------------------------------+
| :FOLLOWS[0]{ts:1}  | Node[1]{}  |
...
+---------------------------------+
10 rows
243 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator        | Estimated Rows | Rows    | DB Hits | Identifiers                                                             | Other                                 |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults |              1 |      10 |       0 | f, follower                                                             | f, follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158]                  |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top            |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | Literal(10);                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |     499 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | anon[155]; anon[158]; anon[155].ts    |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |       0 | anon[142], anon[155], anon[158], f, follower, u                         | f; follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter         |              1 |     499 |       0 | anon[142], f, follower, u                                               | anon[142]                             |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |    1000 |    1000 | anon[142], f, follower, u                                               | f; follower; f.ts < {  AUTOINT2}      |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip           |              1 |    1000 |       0 | f, follower, u                                                          | {  AUTOINT1}                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All)    |              1 | 1000000 | 1000001 | f, follower, u                                                          | (u)<-[  f@12:FOLLOWS]-(  follower@24) |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek   |              1 |       1 |       1 | u                                                                       |                                       |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+

Total database accesses: 1001501

(2) Look at most recent followers

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f                           | follower       |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows  | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |    10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |    10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |    10 |   16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > {  AUTOINT1}) |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 16394 |   16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |     1 |       1 | u              |                                                               |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 32790

(3) Find oldest followers without skipping ahead

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f                     | follower    |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows   | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |     10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |     10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |     10 |  999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < {  AUTOINT1}) |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 999498 |  999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |      1 |       1 | u              |                                                               |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 1998998

这篇关于Neo4j/Cypher有效分页,超大子图有序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆