Neo4j如何建模时间版本图 [英] Neo4j how to model a time-versioned graph

查看:118
本文介绍了Neo4j如何建模时间版本图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的图的一部分具有以下模式:



图表的主要部分是域,有一些人与它链接。人对电子邮件属性有一个唯一的约束,因为我也有其他来源的数据,这很适合。



对我而言,一个人可以是管理员,他有一些设备/日历与他链接。我从SQL数据库获取这些数据,在这里我导入了几张表来合并整个图片。我从一张表开始,它有两列,管理员的电子邮件和他的用户名。此用户标识仅针对生产数据库,并不全面用于其他来源。这就是为什么我使用电子邮件作为人的全球ID。我目前正在使用以下查询导入用户ID,即所有生产表都链接到。我总是获取用户设置和信息的当前快照。此查询每天运行4次:

  CALL apoc.load.jdbc(url,import_query)yield行
MERGE p:Person {email:row.email})
SET p.user_id = row.id



<然后我从其他表中导入链接到此用户标识的所有数据。


现在出现这个问题,因为来自production db的用户可以更改他的电子邮件。因此,我现在正在导入这种方式,最后我会得到两个具有相同user_id的人,随后所有设备/日历都将链接到两个人,因为他们都共享相同的user_id。所以这不是现实的准确表示。
我们还需要捕捉设备连接/断开连接的时间,因为您可以连接/断开设备并将其借出给有不同管理员(user_id)的朋友。



如何更改我的图表模型(导入查询),以便:


  1. 查询当前谁管理员不需要复杂的查询

  2. 查询当前连接设备的人员不需要复杂的查询

  3. 查询历史记录可能会更复杂一些。


解决方案

这个答案是基于伊恩罗宾逊的帖子。

以上图形有3个人物节点。这些节点是域节点的成员。具有 person_id = 1 的个人节点连接到具有 device_id = 1 的设备。此外, person_id = 1 是当前管理员。在:ADMIN 和中和的属性 code>:CONNECTED_DEVICE 关系用于管理图形结构的历史记录。来自的代表一个开始时间点,并且一个结束时间点。为了简化目的,我使用0作为图的初始时间,1000作为时间终止常数。在现实世界图中,当前时间(以毫秒为单位)可用于表示时间点。另外,可以使用 Long.MAX_VALUE 作为EOT常量。与至= 1000 的关系表示目前没有关联的周期上限。



查询:



有了这张图,为了得到当前的管理员,我可以这样做:

  RETURN person 

结果将为:

 ╒═══════════ ══════
│人│
╞═════════││person_id:1}│
└───────────────┘

一个设备,以获得当前连接的用户:

  MATCH(:Device {device_id:1})<  -  [:CONNECTED_DEVICE {to:1000}]  - (person:Person)
RETURN person

结果:

 ╒══════════
│person│
╞═ ════════
│{person_id:1}│
└───────────────┘

要查询当前管理员和连接到设备的当前人员,使用End-Of-Time常量。

查询设备连接/断开事件:

  MATCH( device:device {device_id:1})< -  [r:CONNECTED_DEVICE]  - (person:Person)
RETURN person AS人员,设备AS设备,r.from AS从r.to AS到
ORDER BY r.from

结果:

 ╒═══════════════╤═══════════════╤══════ ╤═════╕
│人│设备││从│到│
╞════════════════════════════ ═════════════
│{person_id:1}│{device_id:1}│0 │1000│
└─────────────────────────────────────────── ─┘

以上结果显示 person_id = 1 连接到 device_id = 1 直到今天。



更改图形结构



考虑当前时间点是30.现在 user_id = 1 device_id = 1 user_id = 2 会连接到它。为了表示这种结构变化,我将运行以下查询:

  //获取当前关联人员
MATCH( (设备:设备{设备ID:1})
//获取person_id = 2
MATCH(person2:Person {person_id:person}) - [old:CONNECTED_DEVICE {to:1000}] - > 2})
//将30设置为person_id = 1和device_id = 1之间连接的结束时间
SET old.to = 30
//将person_id = 2设置为当前连接的连接用户到device_id = 1
//(从时间点31到现在)
CREATE(person2) - [:CONNECTED_DEVICE {from:31,to:1000}] - >(device)

结果图如下:



在此结构变化之后, device_id = 1 的连接历史记录将会:

  MATCH(device:Device {device_id:1 })< -  [r:CONNECTED_DEVICE]  - (person:Person)
RETURN person AS人员,设备AS设备,r.from AS,从r.to AS到
ORDER BY r.from

╒═══════════════════════════════════════════════ ═╕
│人员│设备││从│到│
╞══════════════════════════════ ══════════════
│{person_id:1}│{device_id:1}│0│30│
├────────────────────────────────────────────────────────
│{person_id:2}│{device_id:1}│31│1000│
└───────────────────────────────── ──────────────┴─────┘

以上结果显示 user_id = 1 连接到 device_id = 1 从0到30次。 person_id = 2 目前连接到 device_id = 1

现在连接到 device_id = 1 的当前人是 person_id = 2

  MATCH(:Device {device_id:1})<  -  [:CONNECTED_DEVICE {to:1000} ]  - (人:人)
RETURN人

╒═══════════b $ b│人│
╞══════════
│{person_id:2}│
└────────────── ─┘

同样的方法可以用于管理管理历史记录。



显然这种方法有一些不足:


  • 需要管理一组额外关系

  • 更昂贵的查询

  • 更复杂的查询



但是,如果你确实需要一个版本模式,我相信这种方法是一个很好的选择,或者(至少)是一个好的开始。


Part of my graph has the following schema:

Main part of the graph is the domain, that has some persons linked to it. Person has a unique constraint on the email property, as I also have data from other sources and this fits nicely.

A person can be an admin in my case, where he has some devices/calendars linked to him. I get this data from an SQL db, where I import few tables to combine the whole picture. I start with a table, that has two columns, email of the admin and his user id. This user id is specific only for production database and is not globally used for other sources as well. That is why I use email as global ID for persons. I am currently using the following query to import user id, that all the production tables are linked to. I always get the current snapshot of the user settings and info. This query runs 4x/day:

CALL apoc.load.jdbc(url, import_query) yield row
MERGE (p:Person{email:row.email})
SET p.user_id = row.id

And then I import all the data that is linked to this user id from other tables.

Now the problem occurs, because the user from production db can change his email. So the way I am importing this right now I will end up with two persons having the same user_id and subsequently all the devices/calendars will be linked to both persons, as they both share the same user_id. So this is not an accurate representation of the reality. We also need to capture the connecting/disconnecting of devices to particular user_id through time, as one can connect/disconnect a device and loan it to a friend, that has a different admin (user_id).

How to change my graph model ( importing query ), so that :

  1. Querying who is currently the admin will not require complex queries
  2. Querying who has currently the device connected will not require complex queries
  3. Querying history can be a bit more complex.

解决方案

This answer is based on Ian Robinson's post about time-based versioned graphs.

I don't know if this answer covers ALL the requirements of the question, but I believe that can provide some insights.

Also, I'm considering you are only interested in structural versioning (that is: you are not interested in queries about the changes of the domain user's name over the time). Finally, I'm using a partial representation of your graph model, but I believe that the concepts shown here can be applied in the whole graph.

The initial graph state:

Considering this Cypher to create an initial graph state:

CREATE (admin:Admin)

CREATE (person1:Person {person_id : 1})
CREATE (person2:Person {person_id : 2})
CREATE (person3:Person {person_id : 3})

CREATE (domain1:Domain {domain_id : 1})

CREATE (device1:Device {device_id : 1})

CREATE (person1)-[:ADMIN {from : 0, to : 1000}]->(admin)

CREATE (person1)-[:CONNECTED_DEVICE {from : 0, to : 1000}]->(device1)

CREATE (domain1)-[:MEMBER]->(person1)
CREATE (domain1)-[:MEMBER]->(person2)
CREATE (domain1)-[:MEMBER]->(person3)

Result:

The above graph has 3 person nodes. These nodes are members of a domain node. The person node with person_id = 1 is connected to a device with device_id = 1. Also, person_id = 1 is the current administrator. The properties from and to inside the :ADMIN and :CONNECTED_DEVICE relationships are used to manage the history of the graph structure. from is representing a start point in time and to an end point in time. For simplification purpose I'm using 0 as the initial time of the graph and 1000 as the end-of-time constant. In a real world graph the current time in milliseconds can be used to represent time points. Also, Long.MAX_VALUE can be used instead as the EOT constant. A relationship with to = 1000 means there is no current upper bound to the period associated with it.

Queries:

With this graph, to get the current administrator I can do:

MATCH (person:Person)-[:ADMIN {to:1000}]->(:Admin)
RETURN person

The result will be:

╒═══════════════╕
│"person"       │
╞═══════════════╡
│{"person_id":1}│
└───────────────┘

Given a device, to get the current connected user:

MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
RETURN person

Resulting:

╒═══════════════╕
│"person"       │
╞═══════════════╡
│{"person_id":1}│
└───────────────┘

To query the current administrator and the current person connected to a device the End-Of-Time constant is used.

Query the device connect / disconnect events:

MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
RETURN person AS person, device AS device, r.from AS from, r.to AS to
ORDER BY r.from

Resulting:

╒═══════════════╤═══════════════╤══════╤════╕
│"person"       │"device"       │"from"│"to"│
╞═══════════════╪═══════════════╪══════╪════╡
│{"person_id":1}│{"device_id":1}│0     │1000│
└───────────────┴───────────────┴──────┴────┘

The above result shows that person_id = 1 is connected to device_id = 1 of the beginning until today.

Changing the graph structure

Consider that the current time point is 30. Now user_id = 1 is disconnecting from device_id = 1. user_id = 2 will connect to it. To represent this structural change, I will run the below query:

// Get the current connected person
MATCH (person1:Person)-[old:CONNECTED_DEVICE {to : 1000}]->(device:Device {device_id : 1})
// get person_id = 2
MATCH (person2:Person {person_id : 2}) 
 // set 30 as the end time of the connection between person_id = 1 and device_id = 1
SET old.to = 30
// set person_id = 2 as the current connected user to device_id = 1
// (from time point 31 to now)
CREATE (person2)-[:CONNECTED_DEVICE {from : 31, to: 1000}]->(device) 

The resultant graph will be:

After this structural change, the connection history of device_id = 1 will be:

MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
RETURN person AS person, device AS device, r.from AS from, r.to AS to
ORDER BY r.from

╒═══════════════╤═══════════════╤══════╤════╕
│"person"       │"device"       │"from"│"to"│
╞═══════════════╪═══════════════╪══════╪════╡
│{"person_id":1}│{"device_id":1}│0     │30  │
├───────────────┼───────────────┼──────┼────┤
│{"person_id":2}│{"device_id":1}│31    │1000│
└───────────────┴───────────────┴──────┴────┘

The above result shows that user_id = 1 was connected to device_id = 1 from 0 to 30 time. person_id = 2 is currently connected to device_id = 1.

Now the current person connected to device_id = 1 is person_id = 2:

MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
RETURN person

╒═══════════════╕
│"person"       │
╞═══════════════╡
│{"person_id":2}│
└───────────────┘

The same approach can be applied to manage the admin history.

Obviously this approach has some downsides:

  • Need to manage a set of extra relationships
  • More expensive queries
  • More complex queries

But if you really need a versioning schema I believe this approach is a good option or (at least) a good start point.

这篇关于Neo4j如何建模时间版本图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆