如何在图形数据库(如Neo4j)中建立现实世界关系的模型? [英] How to Model Real-World Relationships in a Graph Database (like Neo4j)?

查看:88
本文介绍了如何在图形数据库(如Neo4j)中建立现实世界关系的模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对图数据库中的建模有一个一般性的问题,我似乎根本无法解决.

I have a general question about modeling in a graph database that I just can't seem to wrap my head around.

您如何为这种类型的关系建模:牛顿发明了微积分"?

How do you model this type of relationship: "Newton invented Calculus"?

简单图中,您可以对其进行建模这个:

In a simple graph, you could model it like this:

Newton (node) -> invented (relationship) -> Calculus (node)

...因此随着您添加更多的人员和发明,您将拥有一堆已发明"的图形关系.

...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.

问题是,您开始需要为关系添加一堆属性:

The problem is, you start needing to add a bunch of properties to the relationship:

  • invention_date
  • influential_concepts
  • influential_people
  • books_inventor_wrote

...,您将要开始在这些属性和其他节点之间创建关系,例如:

...and you'll want to start creating relationships between those properties and other nodes, such as:

  • influential_people:与人节点的关系
  • books_inventor_wrote:与书籍节点的关系

所以现在看来​​,真实世界的关系"(已发明")实际上应该是图中的一个节点,并且该图应该看起来像这样:

So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:

Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)

更复杂的是,其他人也参与了微积分的发明,所以现在的图形变成了类似的东西:

And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:

Newton (node) -> 
  (relationship) -> 
    Newton's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)
Leibniz (node) -> 
  (relationship) -> 
    Leibniz's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)

我问这个问题是因为您似乎不想在实际的图形数据库关系"对象上设置属性,因为您可能希望在某些时候将它们视为节点中的节点.图.

So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.

这正确吗?

我一直在研究 Freebase Metaweb Architecture ,它们似乎是将所有内容都视为一个节点.例如,Freebase的想法是 Mediator/CVT ,您可以在其中创建将演员"节点链接到电影"节点的性能"节点,例如: http ://www.freebase.com/edit/topic/zh/the_last_samurai .不太确定这是否是相同的问题.

I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.

您要使用哪些指导原则来确定真实世界关系"是否实际上应该是图节点而不是图关系?

What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?

如果有关于这个主题的好书,我很想知道.谢谢!

If there are any good books on this topic I would love to know. Thanks!

推荐答案

其中某些内容(例如invention_date)可以作为属性存储在边缘上,因为在大多数图形数据库中,边缘可以具有与顶点可以具有属性.例如,您可以执行以下操作(代码遵循 TinkerPop的蓝图):

Some of these things, such as invention_date, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):

Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex newton = graph.addVertex(null);
newton.setProperty("given_name", "Isaac");
newton.setProperty("surname", "Newton");
newton.setProperty("birth_year", 1643); // use Gregorian dates...
newton.setProperty("type", "PERSON");

Vertex calculus = graph.addVertex(null);
calculus.setProperty("type", "KNOWLEDGE");

Edge newton_calculus = graph.addEdge(null, newton, calculus, "DISCOVERED");
newton_calculus.setProperty("year", 1666);   

现在,让我们稍微扩展一下并添加Liebniz:

Now, lets expand it a little bit and add in Liebniz:

Vertex liebniz = graph.addVertex(null);
liebniz.setProperty("given_name", "Gottfried");
liebniz.setProperty("surnam", "Liebniz");
liebniz.setProperty("birth_year", "1646");
liebniz.setProperty("type", "PERSON");

Edge liebniz_calculus = graph.addEdge(null, liebniz, calculus, "DISCOVERED");
liebniz_calculus.setProperty("year", 1674);

添加书籍:

Vertex principia = graph.addVertex(null);
principia.setProperty("title", "Philosophiæ Naturalis Principia Mathematica");
principia.setProperty("year_first_published", 1687);
Edge newton_principia = graph.addEdge(null, newton, principia, "AUTHOR");
Edge principia_calculus = graph.addEdge(null, principia, calculus, "SUBJECT");

要找出牛顿关于他发现的东西写的所有书,我们可以构建图遍历.我们从牛顿开始,跟随他到他发现的事物的链接,然后反向遍历链接以获取有关该主题的书籍,然后再次反向链接以获取作者.如果作者是牛顿,则返回本书并返回结果.该查询用 Gremlin 编写,这是一种用于图形遍历的基于Groovy的领域特定语言:

To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:

newton.out("DISCOVERED").in("SUBJECT").as("book").in("AUTHOR").filter{it == newton}.back("book").title.unique()

因此,我希望我已经展示了如何巧妙地使用遍历来避免创建中间节点来表示边缘的问题.在小型数据库中,这没什么大不了的,但是在大型数据库中,这样做会遭受很大的性能损失.

Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.

是的,令人遗憾的是您不能将边与图中的其他边相关联,但这是对这些数据库的数据结构的限制.有时将所有内容都设置为节点是有意义的,例如,在Mediator/CVT中,性能也要更加具体.个人可能希望只评论汤姆·克鲁斯(Tom Cruise)在《最后的武士》中的表演.但是,对于大多数图形数据库,我发现某些图形遍历的应用可以使我从数据库中得到所需的信息.

Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.

这篇关于如何在图形数据库(如Neo4j)中建立现实世界关系的模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆