Freebase RDF转储的Jena解析问题(2014年1月) [英] Jena parsing issue for freebase RDF dump (Jan 2014)

查看:80
本文介绍了Freebase RDF转储的Jena解析问题(2014年1月)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用耶拿(Jena)解析freebase转储文件freebase-rdf-2014-01-12-00-00.gz(25 GB). 耶拿(Jena)报告了许多有关不良数据的问题. 示例-150.0无效,true和false值无效 通过在转储文件中的小数点和true/false周围加双引号解决了这些问题. 但是问题仍然是reported by Jena.(current - org.apache.jena.riot.RiotException: [line: 161083, col: 110] Illegal object: [MINUS])

I am trying to parse freebase dump file freebase-rdf-2014-01-12-00-00.gz (25 GB) using Jena. There has been many issues reported by Jena regarding bad data. Example - 150.0 not valid,true and false values not valid These issues I have resolved by adding double quotes around decimal and true/false in dump file. However issues are still being reported by Jena.(current - org.apache.jena.riot.RiotException: [line: 161083, col: 110] Illegal object: [MINUS])

有什么方法可以预处理这些数据,这样我就不必一一解决每个问题了. 我的Java代码:

Is there any way to pre process this data so that I don't have to fix each issues one by one. My Java Code :

    // Open TDB dataset
    String directory = "D:/test_dump";
    Dataset dataset = TDBFactory.createDataset(directory);

    // Assume we want the default model, or we could get a named model here
    Model tdb = dataset.getDefaultModel();

    // Read the input file - only needs to be done once
    String source = "D:/test_dump/fixed-freebase-second-rdf.gz";
    FileManager.get().readModel( tdb, source, "N-TRIPLES" ); 

推荐答案

数据为Turtle格式,而不是N-Triples.他们使用各种Turtle缩写(例如true表示"true"^^xsd:boolean或数字-27表示"-27"^^xsd:integer).

The data is in Turtle format, not N-Triples. They use various Turtle abbreviations (like true for "true"^^xsd:boolean or number -27 for "-27"^^xsd:integer).

由于转储还包含非法语法,例如,可能仍然存在错误.在前缀名称中使用$而不使用必需的\

There may still be errors as their dumps have also contained illegal syntax e.g. use of $ in prefix names without the necessary \

在RDF 更改周围添加引号.

Adding quotes around things changes the RDF.

这篇关于Freebase RDF转储的Jena解析问题(2014年1月)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆