Stardog数据加载和耶拿 [英] Stardog data loading and Jena

查看:143
本文介绍了Stardog数据加载和耶拿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Stardog存储来自不同来源的一堆三元组.我使用Jena来收集和合并单个Jena图中的数据.所有这些三元组都是ABoxes的一部分.

I am using Stardog to store a bunch of triples that come from different sources. I use Jena to collect and merge the data in a single Jena graph. All these triples are part of ABoxes.

  1. 我不确定Stardog是否会要求TBox也要与ABox图形合并.我以为是这样做的,因为否则我将看不到Stardog如何对数据进行推理.我没有看到像其他一些三重存储区那样分开存储和使用TBox的任何选项.我是否需要在耶拿图中包含TBox或是否有一种方法可以将TBox存储在另一个Stardog数据库中,以便在查询ABox数据库时也将其考虑在内?

  1. I am not sure Stardog will require that the TBox is also merged with the ABox graphs. I supposed it does because otherwise I cannot see how Stardog will do reasoning over the data. I have not seen any option to store and use the TBox apart as in some others triple stores. Do I need to include the TBox in the Jena graph or is there a way to store the TBox in another Stardog database so when querying the database of ABoxes it is taken into consideration too?

我正在考虑将Jena图(介于1到700万个三元组之间)加载到Stardog中的选项:

I am considering options to load the Jena graph (varies between 1 and 7 million triples) into Stardog:

  • 我不太喜欢的选项之一是将图形写入文件并执行客户端以将其加载到Stardog中.一个数据在耶拿图中,我更喜欢直接解决方案.
  • 另一种选择是一个一个地加载三元组(
  • One of the options I don't really like is to write the graph into a file and execute the client to load it into Stardog. One the data is in a Jena graph, I would prefer a direct solution.
  • Another option is to load the triples one by one (example of stardog sparql insert query in java), which I dislike for potential inefficiency.

是否有任何优雅的方法可以从耶拿(Jena)加载整个图形?

Is there any elegant way to load the whole graph from Jena?

编辑

根据分发中的示例尝试代码:

Attempt of code based on the example in the distribution:

Server aServer = Stardog.buildServer()
        .bind(new InetSocketAddress("10.0.0.1", 5820))
        .start();

AdminConnection aAdminConnection = AdminConnectionConfiguration.toServer("...").credentials("admin", "admin").connect();
        if (aAdminConnection.list().contains("test")) {
            aAdminConnection.drop("test");
        }

Connection aConn = aAdminConnection.memory("test").create(file).connect();

Model aModel = SDJenaFactory.createModel(aConn);

更正了我的代码中的某些部分.

EDIT 2: Corrected some bits of my code.

Stardog文档

推荐答案

1)只要将TBox存放在Stardog中,在哪里存储都没有关系.默认情况下,Stardog将在TBox的默认图形中查找并自动将其提取.但这可以使用文档中所述的reasoning.schema.graphs配置选项 进行配置.通常,您可能会发现有关在Stardog中如何实现推理的章节非常有用.

1) It does not matter where you store your TBox as long as it's in Stardog. By default, Stardog will look in the default graph for your TBox and extract it automatically. But this can be configured using the reasoning.schema.graphs configuration option as noted in the documentation. Generally, you may find the chapter on how reasoning is implemented in Stardog a useful read.

2)不要一一载入三元组,这不是很有效.将数据导入Stardog的最快方法是在创建数据库时将其加载.在这种情况下,可以使用批量加载器,以达到最佳写入速度.创建数据库后,您可以使用SNARL API,CLI或Jena API加载文件,这是将数据获取到数据库的第二种最快方法.如果使用的是Jena API,则必须直接使用其BulkUpdateHandler或加载RDF/XML,其读者似乎在后台使用批量更新程序.

2) Don't load triples one by one, it's not very efficient. The fastest way to get data into Stardog is to load it when the database is created; the bulk loader can be used in this instance which achieves optimal write speed. Once the database is created, you can use the SNARL API, CLI, or Jena API to load a file, which is the next fastest way to get data into the database. If you are using the Jena API, you have to use their BulkUpdateHandler directly, or load RDF/XML, whose reader seems to use the bulk updater behind the scenes.

您的代码不正确.您正在将服务器绑定到实际的套接字&端口,然后尝试连接到未运行的嵌入式服务器.您必须修改服务器开始以使用示例中所示的嵌入式服务器,或者修改AdminConnectionConfiguration的初始化以使用toServer指定服务器URL.

Your code is incorrect. You're binding a server on an actual socket & port, and then attempting to connect to the embedded server, which you are not running. You have to either modify your server start to use the embedded server as shown in the examples, or modify your initialization of your AdminConnectionConfiguration to specify the server URL using toServer.

此外,您可以调用AdminConnection#memory而不是使用便捷方法createMemory,该方法将返回DatabaseBuilder,其create方法采用文件列表来批量加载到新数据库中.

Further, rather than using the convenience method createMemory you can call AdminConnection#memory which will return a DatabaseBuilder whose create method takes a list of files to bulk load into the new database.

您还应该考虑使用基于磁盘的数据库来存储数百万个三元组.

You should also consider using a disk-based database for the storage of millions of triples.

这篇关于Stardog数据加载和耶拿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆