如何将RDFlib图中的所有三元组插入另一个存储库而又不遍历每个三元组? [英] How to INSERT all triples from an RDFlib graph into another repository without iterating through every triple?

查看:221
本文介绍了如何将RDFlib图中的所有三元组插入另一个存储库而又不遍历每个三元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与我正在尝试将所有三元组从一个芝麻存储库插入另一个存储库(Dydra).有两种方法可以做到这一点,例如使用SERVICE子句或Dydra的GUI.但是,Dydra限制了SERVICE的使用,我想要一种有效的方式来以编程方式插入数据.这是我现在拥有的代码:

I'm trying to INSERT all triples from a Sesame repository into another (Dydra). There are a couple of ways to do it, such as using SERVICE clause or Dydra's GUI. However, Dydra restricts the use of SERVICE and I want an efficient way to insert the data programmatically. This is the code I have right now:

queryStringUpload = 'INSERT {?s ?p ?o} WHERE GRAPH %s {?s ?p ?o}' % dataGraph
    sparql = SPARQLWrapper(dydraSparqlEndpoint)
    sparql.setCredentials(key,key)
    sparql.setQuery(queryStringUpload)
    sparql.method = 'POST'
    sparql.query()

代码导致以下错误:

client error: failed to parse after 'GRAPH' at offset 24 on line 1.
INSERT {?s ?p ?o} WHERE GRAPH [a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']]. {?s ?p ?o}
.

基本上,我知道我错误地使用了字符串格式.执行查询的正确方法是什么?

Basically, I understand that I'm incorrectly using string formatting. What is the correct way to execute the query?

以编程方式执行此操作的一种方法是,遍历dataGraph中的每个三元组并分别INSERT.我已经尝试过这种方法.在代码运行时,并非所有数据都被移植.这就是我正在寻找一种批量移植数据的原因.

One way to programmatically do this is by iterating through every triple in dataGraph and individually INSERTing them. I've tried this approach. While the code works, not all of the data is ported. That's the reason I'm looking for a way to bulk port the data.

更新1

这是我为实现建议的答案而尝试的代码:

This is the code I tried for implementing the suggested answer:

    sesameURL = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements'
payloadPOST = {
        'url': sesameURL,
        # 'account[login]':key,
        # 'account[password]':'',
        # 'csrfmiddlewaretoken':csrfToken_new,
        # 'next':'/',
        }   

        headersPOST = {
        'User-Agent': 'python',
        'Content-Type': 'application/n-quads',
        # 'Referer': dydraLogin,
        }

        paramsPOST = {
        'auth_token': key,
        #'url': sesameURL
        }
        # print payload

        try:
            q = s.post(dydraUrl,data=payloadPOST, params=paramsPOST, headers=headersPOST)
            print "q.text: " + q.text
            print "q_status_code: " + str(q.status_code)
        except requests.exceptions.RequestException as e:
            print e

这是错误:

q_status_code: 400

但是,如果我注释掉'url'属性,则会得到以下信息:

However, if I comment out the 'url' attribute, I get this:

q_status_code: 201

任何有关解决方法的想法都会很有帮助.

Any ideas on how to resolve will be very helpful.

更新2

现在,无论'url'是在headerPOST还是paramsPOST下,我都会得到以下输出:

Now, irrespective of whether 'url' is under headersPOST or paramsPOST, I get the following as output:

q_status_code: 201

但是,我要发布的数据没有被发布.我该怎么做?

However, the data that I want to post doesn't get POSTed. How do I need to do differently?

推荐答案

我不会费心回答为什么您在该SPARQL更新中遇到语法错误,因为这似乎与您真正想知道的内容无关紧要.我也不会打扰回答如何将RDFLib图形上载到Dydra,因为这对于您想知道的内容似乎也无关紧要.我在这里要回答的是如何以编程方式将芝麻存储中的数据上传到Dydra存储中,而不必遍历所有三元组,也无需使用SERVICE子句.

I'm not gonna bother answering why you get that syntax error on that SPARQL update, since it seems immaterial to what you actually want to know. I'm also not going to bother answering how to upload an RDFLib graph to Dydra, since that also seems immaterial to what you want to know. What I'll answer here is how you can upload data from a Sesame store to a Dydra store, programmatically, without having to iterate over all triples, and without use of the SERVICE clause.

Dydra的 REST API

Dydra's REST API is basically identical to the Sesame REST API, so most REST operations you can do on a Sesame store you can also execute on a Dydra store.

您可以向Dydra商店的REST API URL发出HTTP POST请求以获取语句:repository/<ACCOUNT_ID>/<REPO_ID>/statements(请参见请参阅Dydra文档中的以获取更多详细信息).添加一个参数url,该参数指向语句的源芝麻商店URL的URL:(repository/<REPO_ID>/statements).还要确保在POST请求中指定一个Content-Type HTTP标头,该标头指定Dydra支持的RDF语法格式的MIME类型(一个不错的选择是TriG或N-Quads,因为这些格式支持命名图).

You can do a HTTP POST request to your Dydra store's REST API URL for statements: repository/<ACCOUNT_ID>/<REPO_ID>/statements (see here in the Dydra docs for more details). Add a parameter url which points to the URL of your source Sesame store URL for statements: (repository/<REPO_ID>/statements). Also make sure you specify a Content-Type HTTP header in your POST request that specifies the MIME-type of an RDF syntax format supported by Dydra (a good pick is something like TriG or N-Quads, since these formats support named graphs).

您甚至不需要RDFLib.大概您知道如何从Python发出一个简单的HTTP请求,如果没有的话,我敢肯定有很多示例,因为这是一件相当普通的事情.

You don't even need RDFLib for any of this. Presumably you know how to do a simple HTTP request from Python, if not I'm sure there's examples aplenty as it's a fairly generic thing to do.

这篇关于如何将RDFlib图中的所有三元组插入另一个存储库而又不遍历每个三元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆