查询内存中的大型RDF数据集 [英] Querying large RDF Datasets out of memory

查看:118
本文介绍了查询内存中的大型RDF数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的机器上下载两个或更多数据集,并能够为每个数据集启动SPARQL端点.我尝试了Fuseki,这是Jena项目的一部分.但是,它将整个数据集加载到内存中,如果我打算查询诸如DBpedia之类的大型数据集,因为我打算做其他事情(启动多个SPARQL端点并对其使用联邦查询系统),这并不是非常需要的.

I want to download two or more datasets on my machine and be able to start a SPARQL endpoint for each. I tried Fuseki which is part of the Jena project. However, it loads the whole dataset in memory, which is not very much desired if I'm intending to query large datasets like DBpedia given that I intend to do other stuff (starting multiple SPARQL endpoints and use a federated query system over them).

请您多加注意,我打算使用 SILK 链接多个数据集a>,使用 FEDX 联合查询系统进行查询.如果您建议对我正在使用的系统进行任何更改,或者可以给我一些提示,那就太好了.如果您建议适合该项目的数据集,也将对您有很大帮助.

Just to give you a heads up, I intend to link multiple datasets using SILK, querying them using a FEDX federated query system. If you recommend any change of the systems I'm using, or can give me a tip, that would be great. It will also be great of a help if you suggest a dataset that can fit in this project.

推荐答案

Jena的Fuseki可以将TDB用作存储机制,而TDB会将其存储在磁盘上. 在32位和64位Java系统上进行缓存讨论了将文件内容映射到内存的方式.我不相信TDB/Fuseki会将整个数据集加载到内存中;这对于大型数据集来说是不可行的,但是TDB可以处理相当大的数据集.我认为您应该考虑使用tdbloader创建一个TDB存储.那么您可以将Fuseki指向它.

Jena's Fuseki can use TDB as a storage mechanism, and TDB stores things on disk. The TDB docmentation on caching on 32 and 64 bit Java systems discusses the way that the file contents are mapped into memory. I do not believe that TDB/Fuseki loads the entire dataset into memory; this just is not feasible for large datasets, yet TDB can handle rather large datasets. I think what you should consider doing is using tdbloader to create a TDB store; then you can point Fuseki to it.

此答案中有一个设置TDB存储的示例.在此使用tdbquery执行查询,但根据运行文档的Fuseki服务器部分,只需使用--loc=DIR选项,即可使用相同的TDB存储启动Fuseki:

There's an example of setting up a TDB store in this answer. In there, the query is performed with tdbquery, but according to the Running a Fuseki server section of the documentation, all you will need to do to start Fuseki with the same TDB store is use the --loc=DIR option:

  • --loc=DIR
    使用现有的TDB数据库.如果不存在,请创建一个空的.
  • --loc=DIR
    Use an existing TDB database. Create an empty one if it does not exist.

这篇关于查询内存中的大型RDF数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆