图形数据库和RDF三元组:图形数据在python中的存储 [英] Graph databases and RDF triplestores: storage of graph data in python

查看:4536
本文介绍了图形数据库和RDF三元组:图形数据在python中的存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在python中开发一个图形数据库(我会喜欢,如果有人可以加入我的开发中,我已经有一点代码,但我很乐意讨论它)。

I need to develop a graph database in python (I would enjoy if anybody can join me in the development. I already have a bit of code, but I would gladly discuss about it).

我在互联网上做了研究。在Java中, neo4j 是一个候选人,但我无法找到任何有关实际磁盘存储的信息。在python中,有许多图形数据模型(请参阅此PEP前提案,但没有一个满足我从磁盘存储和检索的需要。

I did my research on the internet. in Java, neo4j is a candidate, but I was not able to find anything about actual disk storage. In python, there are many graph data models (see this pre-PEP proposal, but none of them satisfy my need to store and retrieve from disk.

我知道triplestores,但是triplestores基本上是RDF数据库,因此图形数据模型可以映射在RDF和存储,但我通常不安(主要是由于缺乏经验)关于这个解决方案。一个例子是芝麻。事实上,无论如何,在任何情况下,你必须从内存中的图形表示转换为RDF表示,反之亦然,除非客户端代码想直接攻击RDF文档,这是不可能的,这将是处理DB元组直接而不是创建对象。

I do know about triplestores, however. triplestores are basically RDF databases, so a graph data model could be mapped in RDF and stored, but I am generally uneasy (mainly due to lack of experience) about this solution. One example is Sesame. Fact is that, in any case, you have to convert from in-memory graph representation to RDF representation and viceversa in any case, unless the client code wants to hack on the RDF document directly, which is mostly unlikely. It would be like handling DB tuples directly, instead of creating an object.

什么是最先进的存储和检索( a DBMS )的图形数据在python,现在?开始开发一个实现,希望有人对它有兴趣的帮助,并与Graph API PEP的提议者合作是有意义的吗?请注意,这将是我未来几个月的工作的一部分,所以我对这个最终项目的贡献是非常严重的;)

What is the state-of-the-art for storage and retrieval (a la DBMS) of graph data in python, at the moment? Would it make sense to start developing an implementation, hopefully with the help of someone interested in it, and in collaboration with the proposers for the Graph API PEP ? Please note that this is going to be part of my job for the next months, so my contribution to this eventual project is pretty damn serious ;)

编辑:还找到 directededge ,但它似乎是一个商业产品

Edit: Found also directededge, but it appears to be a commercial product

推荐答案

我使用 Jena ,它是一个Java框架, Allegrograph (Lisp,Java,Python绑定)。 Jena有姐妹项目用于存储图形数据,并且已经有很长很长的时间了。 Allegrograph是相当不错,有一个免费版,我想我会建议这个原因是很容易安装,免费,快速,你可以上来,没有时间。你从学习一点RDF和SPARQL获得的力量可能非常值得你。如果你知道SQL已经,那么你是一个伟大的开始。能够使用SPARQL查询图表将为您带来很多好处。序列化到RDF三元组将是容易的,一些文件格式是超级容易(例如NT)。我举个例子。假设您有以下图形node-edge-node ID:

I have used both Jena, which is a Java framework, and Allegrograph (Lisp, Java, Python bindings). Jena has sister projects for storing graph data and has been around a long, long time. Allegrograph is quite good and has a free edition, I think I would suggest this cause it is easy to install, free, fast and you could be up and going in no time. The power you would get from learning a little RDF and SPARQL may very well be worth your while. If you know SQL already then you are off to a great start. Being able to query your graph using SPARQL would yield some great benefits to you. Serializing to RDF triples would be easy, and some of the file formats are super easy ( NT for instance ). I'll give an example. Lets say you have the following graph node-edge-node ids:


1 <- 2 -> 3
3 <- 4 -> 5


这些已经是主语谓词对象形式,所以只是slap一些URI对它进行符号化,将其加载到三元组存储中并通过SPARQL查询。这是NT格式:

these are already subject predicate object form so just slap some URI notation on it, load it in the triple store and query at-will via SPARQL. Here it is in NT format:


<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .


现在从节点1查询所有节点两跳:

Now query for all nodes two hops from node 1:


SELECT ?node
WHERE {
    <http://mycompany.com#1> ?p1 ?o1 .
    ?o1 ?p2 ?node .
}


a href =http://mycompany.com#5 =noreferrer> http://mycompany.com#5 >。

This would of course yield <http://mycompany.com#5>.

另一个候选人将是 Mulgara ,以纯Java编写。因为你似乎对Python更感兴趣,但我想你应该先看看Allegrograph。

Another candidate would be Mulgara, written in pure Java. Since you seem more interested in Python though I think you should take a look at Allegrograph first.

这篇关于图形数据库和RDF三元组:图形数据在python中的存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆