如何删除sparql查询中的重复项 [英] how to remove duplicates in sparql query

查看:236
本文介绍了如何删除sparql查询中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了这个查询,并返回了夫妻和特殊情况的清单。 (在 http://live.dbpedia.org/sparql 中)

I wrote this query and return list of couples and particular condition. ( in http://live.dbpedia.org/sparql)

SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
    select DISTINCT ?actor ?person2 (count (?film) as ?cnt) 
    where { 
        ?film    dbo:starring ?actor .
        ?actor dbo:spouse ?person2. 
        ?film    dbo:starring ?person2.
    }
    order by ?actor
}
FILTER (?cnt >9)
}

问题是某些行重复。
示例:

Problem is that some rows is duplicate. example:

http:// dbpedia。 org / resource / George_Burns http://dbpedia.org/resource/Gracie_Allen 12

http://dbpedia.org/resource/Gracie_Allen http://dbpedia.org/resource/George_Burns 12

如何删除这些重复项?
我为角色添加了性别,但这会破坏当前结果。

how to remove these duplications? I added gender to ?actor but it damage current result.

推荐答案

Natan Cox的答案显示了排除此类伪重复项的典型方法。结果实际上不是重复的,因为其中一个是例如乔治·伯恩斯(George Burns)是角色,而另一个是他是person2。在许多情况下,您可以添加过滤器以要求对这两件事进行排序,这将删除重复的案例。例如,当您拥有以下数据时:

Natan Cox's answer shows the typical way to exclude these kind of pseudo-duplicates. The results aren't actually duplicates, because in one, e.g., George Burns is the ?actor, and in the other he is the ?person2. In many cases, you can add a filter to require that the two things are ordered, and that will remove the duplicate cases. E.g., when you have data like:

:a :likes :b .
:a :likes :c .

,然后搜索

select ?x ?y where { 
  :a :likes ?x, ?y .
}

您可以添加 filter(?x<?y)强制在?x和?y之间进行排序,这将删除这些伪重复项。但是,在这种情况下,这有点棘手,因为找不到使用相同的critera的?actor和?person2。如果DBpedia包含

you can add filter(?x < ?y) to enforce an ordering between the between ?x and ?y which will remove these pseudo-duplicates. However, in this case, it's a bit trickier, since ?actor and ?person2 aren't found using the same critera. If DBpedia contains

:PersonB dbo:spouse :PersonA

但不是

:PersonA dbo:spouse :PersonB

然后,简单的过滤器将不起作用,因为您永远找不到主题PersonA小于对象PersonB。因此,在这种情况下,您还需要对查询进行一些修改以使条件对称:

then the simple filter won't work, because you'll never find the triple where the subject PersonA is less than the object PersonB. So in this case, you also need to modify your query a bit to make the criteria symmetric:

select distinct ?actor ?spouse (count(?film) as ?count) {
  ?film dbo:starring ?actor, ?spouse .
  ?actor dbo:spouse|^dbo:spouse ?spouse .
  filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor

(此查询还显示您在这里不需要子查询,可以使用必须过滤汇总值。)但是重要的部分是使用属性路径 dbo:spouse | ^ dbo:spouse 查找?spouse的值,使得 ?actor dbo:spouse?spouse ?spouse dbo:spouse?actor 。这样会使关系对称,因此即使关系仅在一个方向上声明,也可以确保获得所有对。

(This query also shows that you don't need a subquery here, you can use having to "filter" on aggregate values.) But the important part is using the property path dbo:spouse|^dbo:spouse to find a value for ?spouse such that either ?actor dbo:spouse ?spouse or ?spouse dbo:spouse ?actor. This makes the relationship symmetric, so that you're guaranteed to get all the pairs, even if the relationship is only declared in one direction.

这篇关于如何删除sparql查询中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆