图形数据库 vs. 文档数据库 vs. Triplestores [英] Graph DBs vs. Document DBs vs. Triplestores

查看:19
本文介绍了图形数据库 vs. 文档数据库 vs. Triplestores的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个有点抽象和笼统的问题.我对使用大量内部引用(类图)和大量属性(类 JSON)持久化非结构化数据的不同方法的固有(以及特定于实现)属性感兴趣.

This is a somewhat abstract and general question. I'm interested in the inherent (as well as implementation-specific) properties of different approaches to persist unstructured data with both lots of internal references (graph-like) and lots of properties (JSON-like).

  • 由于图是树的超集,您可以将图 DB(例如 Neo4j)视为文档 DB(例如 MongoDB)的超集.也就是说,图形 DB 提供了文档 DB 的所有功能,此外还允许循环或具有本机指针类型,因此您不必手动取消引用外键/ID.那么,在向对象/资源添加更多引用时,您是否会达到一些临界点,在这种情况下,使用图形数据库会更好,但以前使用文档存储会更好?记录 DB 是否有优势(存储空间、性能?)还是应该始终使用图形 DB 以防将来需要更多参考?

  • Since a graph is a superset of a tree, you can look at graph DBs (e.g. Neo4j) as a superset of document DBs (e.g. MongoDB). That is, a graph DB provides all the functionality of a document DB plus additionally also allows loops or has a native pointer type so you don't have to dereference foreign-keys/ids manually. So is there some tipping point that you reach when adding more references to your objects/resources where you're better off with a graph DB but were previously better off with a document store? Are there advantages to document DBs (storage space, performance?) or should you just always go with a graph DB just in case you'll need more references in the future?

同样,如何比较图形数据库和三元组(例如 RDF 存储)?图数据库(其中节点和边具有属性)似乎是简单三元组的超集.那么对于哪些问题(如果有的话)执行三元组实际上更好,比如 Neo4j?(RDF 存储的一个优点是有一种标准化的查询语言——SPARQL——尽管似乎有很多人不喜欢 SPARQL,因此将其称为劣势.)

Similarly, how do graph DBs and triplestores (e.g. RDF stores) compare? Graph DBs (where nodes and edges have properties) seem to be a superset of the simple triplestores. So for what problems (if any) perform triplestores actually better then, say Neo4j? (One advantage of RDF stores is that there is a standardized query language – SPARQL – although there seem to be a lot of people that don't like SPARQL and thus would call it a disadvantage.)

我想我的问题是:图模型(带属性)似乎可以巧妙地表达各种数据,当你进入现实时有什么收获?我想图形数据库的重点是性能,所以我很想看到一些关于加载、查询和修改数据以及内存和持久存储要求(与文档相比)时会出现什么样的减速的一些数字或经验法则和三重商店).还有水平可扩展性呢?我的印象是那里的比赛场地相当平坦.

I guess my question is: The graph model (with properties) seems to be able to neatly express all kinds of data, what is the catch when you enter reality? I suppose the catch of graph DBs is performance, so I'd love to see some numbers or rules of thumb on what kind of slowdowns to expect when loading, querying and modifying data as well as memory, and persistent storage requirements (compared to document and triple stores). Also what about horizontal scalability? I got the impression that there the playing field is quite level.

你认为具有可表达性的图有可能成为没有超大数据的项目的新默认存储模型,还是我们注定了十年Polyglot Persistence 与 RDBMS、JSON 存储和图形 DB 并存,必须与更多的胶水代码集成?

Do you think it is possible that graphs with their expressibility will become the new default storage model for projects that have not super-large data, or are we doomed for a decade of Polyglot Persistence with RDBMS, JSON stores and Graph DBs living along each other that have to be integrated with even more glue code?

推荐答案

我不确定我是否同意很多人不喜欢 SPARQL 的观点.SPARQL 1.0 确实有一些缺点,但它很好地解决了它的设计目的,新的迭代 SPARQL 1.1 在它的基础上添加了许多人们希望在原始规范中看到的 SQL 构造,包括子查询、聚合&更新语义.我认为事实上它是标准的,你可以期待看到相同的解析 &与 SQL 方言相反,每个三元组存储中的语义是一个不错的功能.

I'm not sure I would agree with the sentiment that a lot of people don't like SPARQL. SPARQL 1.0 did have some short comings, but it quite nicely addressed what it was designed for, and the new iteration, SPARQL 1.1, builds upon it adding many constructs from SQL that people expected to see in the original spec including sub-queries, aggregates & update semantics. I think the fact that it's standard and you can expect to see the same parsing & semantics in every triple store, as opposed to dialects of SQL, is a nice feature.

我还声称所有三元组存储都是图形数据库;您可以将属性放在 RDF 中的特定边上,尽管不如使用 Neo4j 好.但是三元组存储具有真正的查询语言、w3c 标准数据表示形式的优势,这使得将数据带到另一个三元组变得微不足道,并且对于许多三元组,基于 OWL 执行推理的能力.

I would also claim that all triple stores are graph databases; you can put properties on specific edges in RDF, albeit not as nicely as you can w/ Neo4j. But triple stores have the advantage of a real query language, a w3c standard data representation which makes it trivial to take your data to another triplestore, and for a number of triple stores, the ability to perform reasoning based on OWL.

我对大多数图形数据库的可伸缩性一无所知,但一般来说,商业 RDF 数据库的伸缩性很好.所有这些都可以扩展到数十亿个三元组,从而处理大量用例.尽管他们处理规模的方式因供应商而异,但要向上或向外扩展、集群等,您也会看到非常不同的 mem &硬件要求以匹配每个的实现.对我来说,我倾向于只是去拿一个 EC2 实例,通常是 2XL 或 4XL,安装一个足够大的 EBS 来保存数据,而且我已经准备好了.

I dont know anything about the scalability for most graph db's, but generally, the commercial RDF databases scale quite well. All can scale into the billions of triples, which handles a great many use cases. Though how they handle scale differs wildly from vendor to vendor wrt to scale up or scale out, clustering, etc. You'll also see pretty different mem & hardware requirements to match the implementations for each. For me, I've tended to just go and grab an EC2 instance, usually a 2XL or 4XL, mount an EBS large enough to hold the data, and I'm pretty well set.

此外,一些三元组存储与 Lucene 或类似技术集成以提供数据的倒排索引,并且许多现在开始包含地理空间和时间索引.这些是非常有用的功能,我不确定它们在 Neo4j 之类的东西中是否可用.

Additionally, some triple stores integrate with Lucene or similar technologies to provide inverted indexes over the data, and many now are starting to include geo-spatial and temporal indexes. These are very useful features that I'm not sure of their availability in something like Neo4j.

话虽如此,它们不会像关系数据库那样扩展,只是不够成熟.但是,当您拥有真实"数量的数据时,您也不会被搞砸.当然,三元组存储的优势之一是推理,其大规模执行很棘手,但这就是创建各种 OWL 配置文件的大部分原因.但如果你不提前考虑,你可以把自己画成一个角落.

With that said, they're not going to scale as well as a relational databases, they're just not as mature. But you're also not going to get screwed when you have "real" amounts of data either. Of course, one of the advantages of triples stores is reasoning, which performing at scale is tricky, but that's much of the reason why the various OWL profiles were created. But you can paint yourself into a corner if you don't think ahead.

我认为图形数据库,特别是三重存储,可以很好地匹配许多正在构建的应用程序,但我不认为这意味着一切都应该用它们来完成.像其他任何东西一样,它们是具有优点和缺点的工具,因此您必须根据您的应用程序做出正确的选择.但这些天它们可能总是值得至少考虑一下.

I think graph databases, triple stores specifically, can be a pretty good match for a lot of applications that are being built, but I dont think that means that everything should be done with them. Like anything else, they're tools w/ their good points and their bad points, so you kind of have to make the right choice based on your application. But they probably always merit at least a consideration these days.

这篇关于图形数据库 vs. 文档数据库 vs. Triplestores的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆