GraphDB可以使用OWL推理加载1000万条语句吗? [英] Can GraphDB load 10 million statements with OWL reasoning?

查看:86
本文介绍了GraphDB可以使用OWL推理加载1000万条语句吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力加载大多数 Drug Ontology OWL文件和大多数将 ChEBI OWL文件放入具有优化了OWL Horst 推理. /p>

这可能吗?除了有耐心"以外,我还应该做其他事情吗?

详细信息:

我正在使用 loadrdf脱机批量加载器以填充我应该使用其他推理配置吗?我选择了这种配置根据我上周末的实验,这是加载速度最快的OWL配置.我想我需要寻找超越rdfs:subClassOf的关系.

我要加载的文件:

+-------------+------------+---------------------+
|    bytes    | statements |        file         |
+-------------+------------+---------------------+
| 471,265,716 | 4,268,532  | chebi.owl           |
| 61,529      | 451        | chebi-disjoints.owl |
| 82,449      | 1,076      | chebi-proteins.owl  |
| 10,237,338  | 135,369    | dron-chebi.owl      |
| 2,374       | 16         | dron-full.owl       |
| 170,896     | 2,257      | dron-hand.owl       |
| 140,434,070 | 1,986,609  | dron-ingredient.owl |
| 2,391       | 16         | dron-lite.owl       |
| 234,853,064 | 2,495,144  | dron-ndc.owl        |
| 4,970       | 28         | dron-pro.owl        |
| 37,198,480  | 301,031    | dron-rxnorm.owl     |
| 137,507     | 1,228      | dron-upper.owl      |
+-------------+------------+---------------------+

解决方案

@MarkMiller,您可以查看GraphDB 8.4.0发行版中的Preload工具.它是专门为以恒定速度处理大量数据而设计的.请注意,它可以在不进行推断的情况下工作,因此您需要加载数据,然后更改规则集并重新推断语句.

http://graphdb.ontotext.com/documentation /free/loading-data-using-preload.html

I am struggling to load most of the Drug Ontology OWL files and most of the ChEBI OWL files into GraphDB free v8.3 repository with Optimized OWL Horst reasoning on.

is this possible? Should I do something other than "be patient?"

Details:

I'm using the loadrdf offline bulk loader to populate an AWS r4.16xlarge instance with 488.0 GiB and 64 vCPUs

Over the weekend, I played around with different pool buffer sizes and found that most of these files individually load fastest with a pool buffer of 2,000 or 20,000 statements instead of the suggested 200,000. I also added -Xmx470g to the loadrdf script. Most of the OWL files would load individually in less than one hour.

Around 10 pm EDT last night, I started to load all of the files listed below simultaneously. Now it's 11 hours later, and there are still millions of statements to go. The load rate is around 70/second now. It appears that only 30% of my RAM is being used, but the CPU load is consistently around 60.

  • are there websites that document other people doing something of this scale?
  • should I be using a different reasoning configuration? I chose this configuration as it was the fastest loading OWL configuration, based on my experiments over the weekend. I think I will need to look for relationships that go beyond rdfs:subClassOf.

Files I'm trying to load:

+-------------+------------+---------------------+
|    bytes    | statements |        file         |
+-------------+------------+---------------------+
| 471,265,716 | 4,268,532  | chebi.owl           |
| 61,529      | 451        | chebi-disjoints.owl |
| 82,449      | 1,076      | chebi-proteins.owl  |
| 10,237,338  | 135,369    | dron-chebi.owl      |
| 2,374       | 16         | dron-full.owl       |
| 170,896     | 2,257      | dron-hand.owl       |
| 140,434,070 | 1,986,609  | dron-ingredient.owl |
| 2,391       | 16         | dron-lite.owl       |
| 234,853,064 | 2,495,144  | dron-ndc.owl        |
| 4,970       | 28         | dron-pro.owl        |
| 37,198,480  | 301,031    | dron-rxnorm.owl     |
| 137,507     | 1,228      | dron-upper.owl      |
+-------------+------------+---------------------+

解决方案

@MarkMiller you can take a look at the Preload tool, which is part of GraphDB 8.4.0 release. It's specially designed to handle large amount of data with constant speed. Note that it works without inference, so you'll need to load your data and then change the ruleset and reinfer the statements.

http://graphdb.ontotext.com/documentation/free/loading-data-using-preload.html

这篇关于GraphDB可以使用OWL推理加载1000万条语句吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆