Rdf重复三元组 [英] Rdf duplicate triples

查看:69
本文介绍了Rdf重复三元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 RDF 和重复三元组​​的问题.仔细阅读互联网,似乎重复的三元组在某种程度上是坏的"或违反了某些规则.

I have a question a about RDF and duplicate triples. From perusing the internet it seems as if duplicate triples are somehow "bad" or a violation of some rule.

但从表面上看,重复的三元组在我看来是有意义的.

But duplicate triples seem to me, on the surface, meaningful.

假设我想代表一个事实:Susy(主语)提及(谓语)Bob(对象).

Suppose I want to represent the fact: Susy(subject) mentions(predicate) Bob(object).

假设我还想代表 Susy 提到 Bob 5 次.Susy 提到 Bob 的 5 个三元组不能让我代表这个吗?

Suppose that I further wanted to represent that Susy mentions Bob on five times. Wouldn't have 5 triples of Susy mentions Bob allow me to represents this?

稍后的查询想知道 Susy 提到 Bob 多少次,可以只要求这个重复三元组​​的 COUNT.

A later query that wants to know how many times Susy mentioned Bob could just ask for the COUNT of this repeated triple.

所以我的问题是:Susy 提到 Bob 五次的这种表述有什么问题吗?如果是这样,表示 Susy 提到 Bob 五次这一事实的首选方式是什么?

So my question is: is there anything wrong with this representation of the fact that Susy mentions Bob five time. And if so, what would be the preferred way of representing that the fact that Susy mentions Bob five times.

推荐答案

理论上 RDF 图是一个 set 三元组,这意味着每个三元组只能出现一次.当然,你可以有一个文档,比如在 Turtle 中,它包含一个三元组或四元组的重复项,但在加载到内存/存储后,这些三元组应该被视为一个.毕竟任何文档都只是文本.

In theory RDF graph is a set of triples, which means that each triple can occur just once. Of course you could have a document, say in Turtle, which contains duplicates of a triple or quads but after loading to memory/store those triples should be treated as one. Any document is just text after all.

也就是说,我看到了根据三元组商店的不同行为.例如 AllegroGraph 默认加载和处理重复的三元组.有一个手动选项可以修剪重复项.

That said I've seen different behaviour depending on triple stores. For example AllegroGraph by default loads and handles duplicate triples. There is a manual options to trim the duplicates.

不,查询不会告诉您有重复的问题,因为 SPARQL 聚合适用于节点而不是整个三元组.

And no, querying will not tell you that you have a duplicate question, because SPARQL aggregations work with nodes and not whole triples.

关于您的示例,有多种方法.

Regarding your example, there are multiple ways.

TL/DR 您需要一种方法来添加关于语句的语句.请参阅this slideshare 了解各种方式,其中一些我在下面简要描述.

TL/DR you will need a way to add statements about statements. See this slideshare for various ways, some of which I briefly described below.

完整答案

最简单的方法是引入某种人工中介图节点,可以称为提及或其他名称.例如

The easiest is to introduce some kind of artificial intemediary graph node, which could be called Mention or whatever. For example

:Susan :mentions [
  rdf:type :Mention ;
  :mentionsWhom :Bob ;
  :times 5 
]

问题在于,如果您碰巧将这种结构引入现有数据,这会破坏现有语义.

The problem is that this breaks existing semantics shall you happen to introduce such structure to existing data.

一种简单且广泛支持的方法是使用命名图 这样你就有 quads 而不是三元组.下面的示例增强了海龟语法,使其成为 TriG.请注意,名称图只是另一种资源.使用任何 SPARQL 处理器也可以轻松查询命名图.

A simple and widely supported way is to use named graphs so that you have quads instead of triples. Below example enhances turtle syntax so that it becomes TriG. Note that the names graph is just another resource. Named graphs are also easy to query with any SPARQL processor.

# :susanMentionsBob is the named graph
:susanMentionsBob {
   :Susan :mentions :Bob
}

# we can say more about that graph
:susanMentionsBob :times 5

<小时>

另一种传统的解决方案是使用一种具体化.通过具体化,您可以创建一个 rdf:Statement 对象,您可以在其中添加其他数据.缺点是需要重复原来的三重s/p/o


Another traditional solution is to use a form of reification. With reification you create a rdf:Statement object, where you can add additional data. The downside is that you need to repeat the original triple s/p/o

:Susan :mentions :Bob . # actual triple intact
_:reifiedStatement
   rdf:type rdf:Statement ;
   rdf:subject :Susan ;
   rdf:predicate :mentions ;
   rdf:subject :Bob ;
   :times 5 . # extra statement about the mention

<小时>

最近引入了更简洁的具体化方法.您可以改用 单一属性.您引入了一个额外的谓词,该谓词替换了 :mentions 用于单个用法,并向该属性添加了额外的语句:


Lately more concise ways to reification have been introduced. You can use Singleton Property instead. You introduce an extra predicate, which replaces :mentions for a single usage and you add additional statement to that property:

:Susan :mentions#1 :Bob .
:mentions#1 rdf:singletonPropertyOf :mentions .
:mentions#1 :times 5 .

请注意,您可以为 :mentions#1 属性使用任何名称以避免冲突.请查看上面链接的 sildeshare 以获取更多示例和 SPARQL 用法

Note that you can use any name for the :mentions#1 property ot avoid collisions. Please have a look about the sildeshare linked above for more examples and SPARQL usage

最后但并非最不重要的一种非标准方式,仅由 BigData AFAIK 支持,是 Reification Done RightRDR.用 RDR 你可以写

Last but not least a non-standard way, supported only by BigData AFAIK, is Reification Done Right, or RDR. With RDR you can write

<<:Susan :mentions :Bob>> :times 5

通过添加双尖括号,您可以添加关于语句的语句.这也适用于 BigData 的 SPARQL 处理器.

By adding double angle brackets you can add statements aboout statements. This also works in BigData's SPARQL processor.

这篇关于Rdf重复三元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆