如何在图数据库(如 Neo4j)中建模真实世界的关系? [英] How to Model Real-World Relationships in a Graph Database (like Neo4j)?

查看:17
本文介绍了如何在图数据库(如 Neo4j)中建模真实世界的关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于在图形数据库中建模的一般性问题,我似乎无法解决这个问题.

I have a general question about modeling in a graph database that I just can't seem to wrap my head around.

您如何模拟这种类型的关系:牛顿发明了微积分"?

How do you model this type of relationship: "Newton invented Calculus"?

简单图中,您可以像这样建模这个:

In a simple graph, you could model it like this:

Newton (node) -> invented (relationship) -> Calculus (node)

...所以当你添加更多的人和发明时,你会有一堆发明的"图形关系.

...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.

问题是,您开始需要向关系添加一堆属性:

The problem is, you start needing to add a bunch of properties to the relationship:

  • invention_date
  • influential_concepts
  • influential_people
  • books_inventor_wrote

...并且您需要开始在这些属性和其他节点之间创建关系,例如:

...and you'll want to start creating relationships between those properties and other nodes, such as:

  • influential_people:与人物节点的关系
  • books_inventor_wrote:与书籍节点的关系

所以现在看起来现实世界的关系"(发明的")实际上应该是图中的一个节点,并且该图应如下所示:

So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:

Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)

更复杂的是,其他人也参与了微积分的发明,所以图现在变成了这样:

And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:

Newton (node) -> 
  (relationship) -> 
    Newton's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)
Leibniz (node) -> 
  (relationship) -> 
    Leibniz's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)

所以我问这个问题是因为您似乎不想在实际图形数据库关系"对象上设置属性,因为您可能希望在某个时候将它们视为节点图表.

So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.

这是正确的吗?

我一直在研究 Freebase Metaweb Architecture,它们似乎是将一切视为一个节点.例如,Freebase 有一个Mediator/CVT 的想法,您可以在其中创建一个将演员"节点链接到电影"节点的性能"节点,如下所示:http://www.freebase.com/edit/topic/en/the_last_samurai.不过不太确定这是否是同一个问题.

I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.

您使用哪些指导原则来确定现实世界的关系"是否应该是图节点而不是图关系?

What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?

如果有关于这个主题的好书,我很想知道.谢谢!

If there are any good books on this topic I would love to know. Thanks!

推荐答案

其中一些东西,例如 invention_date,可以存储为边上的属性,因为在大多数图形数据库中边可以具有属性就像顶点可以有属性一样.例如,您可以执行以下操作(代码如下 TinkerPop 的蓝图):

Some of these things, such as invention_date, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):

Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex newton = graph.addVertex(null);
newton.setProperty("given_name", "Isaac");
newton.setProperty("surname", "Newton");
newton.setProperty("birth_year", 1643); // use Gregorian dates...
newton.setProperty("type", "PERSON");

Vertex calculus = graph.addVertex(null);
calculus.setProperty("type", "KNOWLEDGE");

Edge newton_calculus = graph.addEdge(null, newton, calculus, "DISCOVERED");
newton_calculus.setProperty("year", 1666);   

现在,让我们稍微扩展一下并添加 Liebniz:

Now, lets expand it a little bit and add in Liebniz:

Vertex liebniz = graph.addVertex(null);
liebniz.setProperty("given_name", "Gottfried");
liebniz.setProperty("surnam", "Liebniz");
liebniz.setProperty("birth_year", "1646");
liebniz.setProperty("type", "PERSON");

Edge liebniz_calculus = graph.addEdge(null, liebniz, calculus, "DISCOVERED");
liebniz_calculus.setProperty("year", 1674);

在书籍中添加:

Vertex principia = graph.addVertex(null);
principia.setProperty("title", "Philosophiæ Naturalis Principia Mathematica");
principia.setProperty("year_first_published", 1687);
Edge newton_principia = graph.addEdge(null, newton, principia, "AUTHOR");
Edge principia_calculus = graph.addEdge(null, principia, calculus, "SUBJECT");

要找出牛顿写的关于他发现的事物的所有书籍,我们可以构建图遍历.我们从牛顿开始,沿着他发现的事物的输出链接,然后反向遍历链接以获取有关该主题的书籍,然后再次反向访问链接以获取作者.如果作者是牛顿,则返回本书并返回结果.此查询是用 Gremlin 编写的,这是一种基于 Groovy 的图遍历领域特定语言:

To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:

newton.out("DISCOVERED").in("SUBJECT").as("book").in("AUTHOR").filter{it == newton}.back("book").title.unique()

因此,我希望我已经展示了一些如何使用巧妙的遍历来避免创建中间节点来表示边的问题.在小型数据库中,这无关紧要,但在大型数据库中,这样做会导致性能大幅下降.

Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.

是的,很遗憾您不能将边与图中的其他边相关联,但这是这些数据库的数据结构的限制.有时将所有内容都设为节点是有意义的,例如,在 Mediator/CVT 中,性能也更具体一些.个人可能希望在评论中仅提及汤姆克鲁斯在最后的武士"中的表现.然而,对于大多数图数据库,我发现应用一些图遍历可以从数据库中得到我想要的东西.

Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.

这篇关于如何在图数据库(如 Neo4j)中建模真实世界的关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆