从基于URI前缀的DBpedia SPARQL查询中排除结果 [英] Exclude results from DBpedia SPARQL query based on URI prefix

查看:98
本文介绍了从基于URI前缀的DBpedia SPARQL查询中排除结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 DBpedia SPARQL端点时,如何排除一组概念?我正在使用以下基本查询来获取概念列表:

How can I excluding a group of concepts when using the DBpedia SPARQL endpoint? I'm using the following basic query to get a list of concepts:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
}
LIMIT 100

SPARQL结果

这给了我100个概念的列表。我想排除所有属于YAGO类/组的概念(即,其IRI以 http://dbpedia.org/class/yago/ 开头)。我可以过滤出这样的单个概念:

This gives me a list of 100 concepts. I want to exclude all the concepts that fall into the YAGO class/group (i.e., whose IRIs begin with http://dbpedia.org/class/yago/). I can filter out individual concepts like this:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
    FILTER (?concept != <http://dbpedia.org/class/yago/1950sScienceFictionFilms>)
}
LIMIT 100

SPARQL结果

但是我似乎无法理解的是如何从中排除所有YAGO子类我的结果?我试过这样使用 * ,但这没实现任何事情:

But what I can't seem to understand is how to exclude all YAGO sub-classes from my results? I tried using a * like this but this didn't achieve anything:

FILTER (?concept != <http://dbpedia.org/class/yago/*>)



更新:



使用 regex 进行的查询似乎可以解决问题,但这确实非常重要缓慢而丑陋。我真的很期待更好的选择。

Update:

This query with regex seems to do the trick, but it's really, really slow and ugly. I'm really looking forward to a better alternative.

SELECT DISTINCT ?type WHERE {
  [] a ?type
  FILTER( regex(str(?type), "^(?!http://dbpedia.org/class/yago/).+"))
}
ORDER BY ASC(?type)
LIMIT 10


推荐答案

似乎有点尴尬,但您对转换为字符串并进行基于字符串的检查的评论可能是正确的。您可以使用SPARQL 1.1函数 strstarts

It might seem a little awkward, but your comment about casting to a string and doing some string-based checks is probably on the right track. You can do it a little bit more efficiently using the SPARQL 1.1 function strstarts:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
    FILTER ( !strstarts(str(?concept), "http://dbpedia.org/class/yago/") )
}
LIMIT 100

SPARQL结果

另一种选择是找到顶级YAGO类,并排除那些 rdfs:subClassOf 的概念。从长远来看,这可能是一个更好的解决方案(因为它不需要转换为字符串,并且基于图形结构)。不幸的是,看起来没有一个 顶级YAGO类可以与<$ c $ owl> owl:Thing 相提并论。我刚刚从 DBpedia的下载页面下载了YAGO类型层次结构,并运行了该查询,该查询要求没有

The other alternative would be to find a top level YAGO class, and to exclude those concepts that are rdfs:subClassOf that top level class. This would probably be a better solution in the long run (since it doesn't require casting to strings, and it's based on graph structure). Unfortunately, it doesn't look like there is a single top level YAGO class comparable to owl:Thing. I just downloaded the YAGO type hierarchy from DBpedia's download page and ran this query, which asks for classes with no superclasses, against it:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select distinct ?root where {
  [] rdfs:subClassOf ?root 
  filter not exists { ?root rdfs:subClassOf ?superRoot }
}

我得到了这九个结果:

----------------------------------------------------------------
| root                                                         |
================================================================
| <http://dbpedia.org/class/yago/YagoLegalActorGeo>            |
| <http://dbpedia.org/class/yago/WaterNymph109550125>          |
| <http://dbpedia.org/class/yago/PhysicalEntity100001930>      |
| <http://dbpedia.org/class/yago/Abstraction100002137>         |
| <http://dbpedia.org/class/yago/YagoIdentifier>               |
| <http://dbpedia.org/class/yago/YagoLiteral>                  |
| <http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity> |
| <http://dbpedia.org/class/yago/Thing104424418>               |
| <http://dbpedia.org/class/yago/Dryad109551040>               |
----------------------------------------------------------------

鉴于YAGO概念并不像其他一些概念那样结构化,在这种情况下,基于字符串的方法似乎是最好的。但是,如果愿意,您可以执行这样的基于非字符串的查询,该查询要求100个概念,但不包括将这9个结果之一作为超类的概念:

Given that the YAGO concepts aren't quite as structured as some of the others, it looks like the string based approach may be the best in this case. However, if you wanted to, you could do the a non-string-based query like this, which asks for 100 concepts, excluding those which have one of those nine results as a superclass:

select distinct ?concept where {
  [] a ?concept .
  filter not exists {
    ?concept rdfs:subClassOf* ?super .
    values ?super { 
      yago:YagoLegalActorGeo
      yago:WaterNymph109550125
      yago:PhysicalEntity100001930
      yago:Abstraction100002137
      yago:YagoIdentifier
      yago:YagoLiteral
      yago:YagoPermanentlyLocatedEntity
      yago:Thing104424418
      yago:Dryad109551040
    }
  }
}
limit 100

SPARQL结果

我不确定哪个最终会更快。第一个需要转换为字符串,而 strstarts 如果以朴素的方式实现,则必须消耗 http://dbpedia.org/class / 在每个概念出现不匹配之前。第二个要求进行九次比较,如果对IRI进行了检查,则只是对象身份检查。这是一个有趣的问题,需要进一步调查。

I'm not sure which ends up being faster. The first requires a conversion to string, and the strstarts, if implemented in a naïve fashion, has to consume http://dbpedia.org/class/ in each concept before something is a mismatch. The second requires nine comparisons that, if IRIs are interned, are just object identity checks. It's a an interesting question for further investigation.

这篇关于从基于URI前缀的DBpedia SPARQL查询中排除结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆