在RDF中为数据库NULL建模 [英] Modelling an equivalent of database NULL in RDF

查看:88
本文介绍了在RDF中为数据库NULL建模的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在RDF数据的数据库中是否存在一种标准的或公认的表示NULL的方式.

I would like to know if there is a standard or generally accepted way of representing an equivalent of NULL used in databases for RDF data.

更具体地说,我对一种区分属性 p ( p 为谓词, o 是RDF三元组的对象):

More specifically, I'm interested in a way to distinguish the following cases for a value o of a property p (p is the predicate, o the object of an RDF triple):

  1. 该值不适用,即属性 p 不存在或在上下文中没有意义.
  2. 该值是 unknown ,即应该存在,但我们不知道.
  3. 该值不存在,即该财产没有任何值(例如,一个活着的人的去世年份).
  4. 值是 witheld ,例如何时不允许数据使用者访问它.
  1. The value is not applicable, i.e. property p does not exist or does not make sense in the context.
  2. The value is unknown, i.e. it should be there but we don't know it.
  3. The value doesn't exist, i.e. the property doesn't have a value (e.g. year of death for a person alive).
  4. The value is witheld, e.g. when the data consumer is not allowed to access it.

推荐答案

我不知道执行此操作的标准方法,但是在RDF中工作的优点之一是,您在如何操作方面具有很大的灵活性.决定这样做. RDF(本身)无法表示否定(即,没有一种非常方便的方式可以说三元组 s p o 不成立),但是OWL可以.关于您描述的四种情况,可以采取以下几种方法:

I don't know of a standard way of doing this, but one of the advantages of working in RDF is that you have a lot of flexibility in how you decide to do this. RDF, per se, cannot express negation (i.e., there is no incredibly convenient way to say that a triple s p o does not hold), but OWL can. As to the four cases you descibed, here are some approaches that you might make:

1.该值不适用,即属性p不存在或在上下文中没有意义.

如果属性 p 具有主题 s 的值没有太大意义,那么不写任何三元组可能是可以接受的格式为 spo .由于RDF是一个开放世界的假设,因此在数据检索中通常只需要查询一个人感兴趣的数据,而不会花太多精力检查是否有意外情况.如果确实要进行完整性检查,则可以声明RDFS域和属性范围.例如,您可能有:

If it does not make much sense for a property p to be have a value for a subject s, then it's probably acceptable to just not write any triples of the of the form s p o. Since RDF makes an open world assumption, it is often the case that, in data retrieval, one only queries for the data that one is interested in, and does not make too much of an effort to check where there are unexpected things. If you do want to do some sanity checking, then you can declare RDFS domains and ranges for properties. For instance, you might have:

hasBirthDate rdfs:domain AnimateObject .
hasConstructionDate rdfs:domain InanimateObject .

根据语义,如果您有

object82 hasBirthDate "2013-04-01" ;
         hasConstructionDate "2013-04-02" .

然后您还可以推断出

object82 a AnimateObject, a InanimateObject .

,您可能会进行健全性检查,以查找同时为AnimateObjectInanimateObject的事物.如果两者兼而有之,则可能是您应该研究的问题.如果使用OWL,则实际上可以声明AnimateObjectInanimateObject是不相交的,并检查逻辑一致性.另外,在OWL中,您可以添加断言,例如

and you might run a sanity check that looks for things that are both AnimateObjects and InanimateObjects. If anything is both, you probably have a problem that you should look into. If you use OWL, then you can actually declare that the AnimateObject and InanimateObject are disjoint and check for logical consistency. Alternatively, in OWL, you can add assertions such as

object82 hasConstructionDate max 0 

表示object82应该不具有属性hasConstructionDate的值.

which says that object82 should have no values for the property hasConstructionDate.

在任何情况下,请在属性中添加rdfs:comment,以说明该属性应用于什么以及不该用于什么.在适当的情况下,向个人添加rdfs:comment,以解释为什么他们不应该具有给定属性的值,如果他们不应该具有这样的值.

In any case, add rdfs:comments to your properties explaining what the property should be used for and what it should not be used for. When appropriate, add rdfs:comments to individuals to explain why they should not have a value for a given property, if they should not have such a value.

2.该值是未知的,即应该存在,但我们不知道.

在这种情况下,重要的是要精确确定应该"的含义.例如,在OWL中,您可以说

In this case, it is important to pin down what exactly "should" means. In OWL, for instance, you can say that

Person SubClassof (hasName min 1 String)

通过属性hasName断言每个person与至少一个String相关;也就是说,每个人至少都有一个名字.这是说有价值的一种方式,但是在特定情况下我们可能不知道它是什么.如果您不能使用OWL,而只能使用RDF,则可能应在"hasName"一行中添加"rdfs:comment"到每个NamedEntity该属性至少具有一个值"这一行.

to assert that every person is related to at least one String by the property hasName; that is, every person has at least one name. That is one way of saying that there is some value, but we might not know what it is in a particular case. If you cannot work with OWL, but only with RDF, then you should probably add an rdfs:comment to the property hasName along the lines of "each NamedEntity should have at least one value for this property."

3.该值不存在,即该财产没有任何值(例如,一个活着的人的死亡年限).

这是一个有趣的情况,因为RDF没有内置的时间概念(从某种意义上说,在给定时间之前,某些三元组成立,而在此之后,其他三元组成立).如果您只是将RDF图用作可以更新的类似数据库的存储(通过删除并插入新的三元组),则可能可以对.就像我们在RDF中所做的那样,拥有开放式数据模型使执行这样的操作特别容易,因为您确实可以为其使用一些新的值:

This is an interesting case, because RDF has no built in notion of time (in the sense that some triple holds until a given time, and after which time some other triple holds). If you are simply using an RDF graph as a database-like store that you can update (both by removing and inserting new triples), you could probably use some special reserved value for "I'm not dead yet!". Having an open ended data model, as we do in RDF, makes it particularly easy to do something like this, because you really can just use some new value for it:

mp:JohnCleese hasDeathDate mp:notDeadYet .
mp:GrahamChapman hasDeathDate "1989-10-04" .

当然,您还可以提高一些精度,并使用布尔值属性来指示第一个属性的值是否有意义:

Of course, you can also be a bit more refined and use a boolean-valued property to indicate whether or not a value for the first property makes sense:

mp:JohnCleese isDeceased "false" .
mp:GrahamChapman isDeceased "true" ;
                 hasDeathDate "1989-10-04" .

4.例如,在不允许数据使用者访问该值时,将保留该值.

在我看来,这是最有趣的情况,因为它可能涉及最有趣的数据转换.如果您有一个很好的数据集,人们可以查询,并且想要指出一些他们将获得的结果(除了缺乏权限之外),否则您可以有很多选择.例如,您可以使用HTTP状态代码之类的内容将图形中的节点替换为像编辑一样的空白节点.例如,您可能具有以下数据:

This, in my opinion, is the most interesting case, because it potentially involves the most interesting data transformation. If you have a nice dataset that people can query, and you want to indicate something about the results that they would obtain except for their lack of permission, you have lots of options in representing this. For instance, you could use something like HTTP status codes to replace nodes in the graph with blank nodes acting like redaction. For instance, you might have the data:

ex:JohnDoe hasSSN "000-00-0000" .
ex:JaneDoe hasSSN "000-00-0001" .

当有人要数据时,您可能会做出响应(假设第一个值有效,而第二个值无效):

When someone asks for the data, you might respond (supposing that the first value is valid, and the second one invalid):

ex:JohnDoe hasSSN [ a ex:ValidSSN ] .
ex:JaneDoe hasSSN [ a ex:InvalidSSN ] .

通常,您可以向消费者提供与实际拥有的数据不同的数据视图.我不知道执行此类操作的任何标准.您可能对最近有些相关的W3C建议 PROV-O:PROV本体论感兴趣.描述信息来源的词汇(例如,信息的来源,归因于什么);这对于描述可能无法完整地提供给请求者的资源种类很有用.

In general, you could present a different view of the data to consumers than what you actually possess. I do not know of any standards for doing this sort of thing. You might be interested in the, somewhat related, recent W3C recommendation, PROV-O: The PROV Ontology, a vocabulary for describing the provenance of information (e.g., what it was generated from, to what is it attributed); it could be useful in describing the sorts of resources that might not, in their full form, be available to requesters.

这篇关于在RDF中为数据库NULL建模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆