将CSV转换为RDF,其中一列是一组值 [英] Converting a CSV to RDF where one column is a set of values
问题描述
我想将CSV转换为RDF.
I want to convert a CSV to RDF.
实际上,该CSV列中的一列是一组带有分隔符(在我的情况下为空格字符)的值.
One of the column of that CSV is, in fact, a set of values joined with a separator character (in my case, the space character).
这是示例CSV(带标题):
Here is a sample CSV (with header):
col1,col2,col3
"A","B C D","John"
"M","X Y Z","Jack"
我希望转换过程创建类似于以下内容的RDF:
I would like the conversion process to create a RDF similar to this:
:A :aProperty :B, :C, :D; :anotherProperty "John".
:M :aProperty :X, :Y, :Z; :anotherProperty "Jack".
我通常使用Tarql进行CSV转换.
每行都可以进行迭代.
但是它没有功能可以在列值的内部"进行子迭代.
I usually use Tarql for CSV conversion.
It is fine to iterate per row.
But it has no feature to sub-iterate "inside" a column value.
SPARQL-Generate可能会有所帮助(据我所知,使用iter:regex和sub-generate).但是我找不到与我的用例匹配的示例.
SPARQL-Generate may help (with iter:regex and sub-generate, as far as a I understand). But I cannot find any example that matches my use case.
PS:也许RML也可以提供帮助.但是我对此技术没有任何了解.
PS: may be RML can help too. But I have no prior knowledge of this technology.
推荐答案
您可以使用 RML 和
首先,我们需要访问RML可以完成的每一行.RML允许您使用 LogicalSource .指定迭代器( rml:iterator
)不需要,因为RML中的默认迭代器是基于行的迭代器.这将导致以下RDF(海龟):
First, we need to access each row which can be accomplished with RML.
RML allows you to iterate over each row of the CSV file (ql:CSV
) with a
LogicalSource.
Specifying the iterator (rml:iterator
)
is not needed since the default iterator in RML is a row-based iterator.
This results into the following RDF (Turtle):
<#LogicalSource>
a rml:LogicalSource;
rml:source "data.csv";
rml:referenceFormulation ql:CSV.
实际的三元组是在 TriplesMap 的帮助下生成的使用LogicalSource从每个CSV行中检索数据:
The actually triples are generated with the help of a TriplesMap which uses the LogicalSource to retrieve the data from each CSV row:
<#MyTriplesMap>
a rr:TriplesMap;
rml:logicalSource <#LogicalSource>;
rr:subjectMap [
rr:template "http://example.org/{col1}";
];
rr:predicateObjectMap [
rr:predicate ex:aProperty;
rr:objectMap <#FunctionMap>;
];
rr:predicateObjectMap [
rr:predicate ex:anotherProperty;
rr:objectMap [
rml:reference "col3";
];
].
col3
CSV列用于创建以下三元组:
The col3
CSV column be used to create the following triple:
<http://example.org/A> <http://example.org/ns#anotherProperty> "John".
但是,CSV列 col2
中的字符串需要首先分割.这可以通过Fno(功能本体)和RML处理器来实现.支持FnO功能的执行.这样的RML处理器可以是 RML映射器,但其他处理器可以也被使用.需要以下RDF来调用FnO函数,该函数将输入分割字符串,以空格作为分隔符,我们的LogicalSource作为输入数据:
However, the string in the CSV column col2
needs to be split first.
This can be achieved with Fno (Function Ontology) and an RML processor which
supports the execution of FnO functions. Such RML processor can be the
RML Mapper, but other processors can
be used too.
The following RDF is needed to invoke an FnO function which splits the input
string with a space as separator with our LogicalSource as input data:
<#FunctionMap>
fnml:functionValue [
rml:logicalSource <#LogicalSource>; # our LogicalSource
rr:predicateObjectMap [
rr:predicate fno:executes;
rr:objectMap [
rr:constant grel:string_split # function to use
];
];
rr:predicateObjectMap [
rr:predicate grel:valueParameter;
rr:objectMap [
rml:reference "col2" # input string
];
];
rr:predicateObjectMap [
rr:predicate grel:p_string_sep;
rr:objectMap [
rr:constant " "; # space separator
];
];
].
RML映射器支持的FnO功能在此处可用: https://rml.io/docs/rmlmapper/default-functions/您可以在该页面上找到函数名称及其参数.
The supported FnO functions by the RML mapper are available here: https://rml.io/docs/rmlmapper/default-functions/ You can find the function name and its parameters on that page.
映射规则
@base <http://example.org> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix ex: <http://example.org/ns#> .
<#LogicalSource>
a rml:LogicalSource;
rml:source "data.csv";
rml:referenceFormulation ql:CSV.
<#MyTriplesMap>
a rr:TriplesMap;
rml:logicalSource <#LogicalSource>;
rr:subjectMap [
rr:template "http://example.org/{col1}";
];
rr:predicateObjectMap [
rr:predicate ex:aProperty;
rr:objectMap <#FunctionMap>;
];
rr:predicateObjectMap [
rr:predicate ex:anotherProperty;
rr:objectMap [
rml:reference "col3";
];
].
<#FunctionMap>
fnml:functionValue [
rml:logicalSource <#LogicalSource>;
rr:predicateObjectMap [
rr:predicate fno:executes;
rr:objectMap [
rr:constant grel:string_split
];
];
rr:predicateObjectMap [
rr:predicate grel:valueParameter;
rr:objectMap [
rml:reference "col2"
];
];
rr:predicateObjectMap [
rr:predicate grel:p_string_sep;
rr:objectMap [
rr:constant " ";
];
];
].
输出
<http://example.org/A> <http://example.org/ns#aProperty> "B".
<http://example.org/A> <http://example.org/ns#aProperty> "C".
<http://example.org/A> <http://example.org/ns#aProperty> "D".
<http://example.org/A> <http://example.org/ns#anotherProperty> "John".
<http://example.org/M> <http://example.org/ns#aProperty> "X".
<http://example.org/M> <http://example.org/ns#aProperty> "Y".
<http://example.org/M> <http://example.org/ns#aProperty> "Z".
<http://example.org/M> <http://example.org/ns#anotherProperty> "Jack".
注意:我为RML及其技术做出了贡献.
Note: I contribute to RML and its technologies.
这篇关于将CSV转换为RDF,其中一列是一组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!