SPARQL查询以从DBpedia中检索国家/地区人口 [英] SPARQL query to retrieve countries population from DBpedia

查看:315
本文介绍了SPARQL查询以从DBpedia中检索国家/地区人口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了以下SPARQL查询,以从DBpedia获取具有其人口的国家/地区列表。我使用联合条款来确定哪些资源是当前国家/地区,因为不同国家/地区之间的信息不一致,例如,国家/地区代码的标准不同,其中一些甚至没有标准。

I have developed the following SPARQL query to get a list of countries with its population from DBpedia. I use the union clauses to identify which resources are current countries because the information is inconsistent between the different countries, for example there are different standards for country codes and some of them don't even have one.

我现在遇到的问题是,某些国家/地区拥有 dbpprop:populationEstimate 属性,而其他国家/地区则具有 dbpprop:populationCensus ,我不知道如何让它们都绑定?人口。由于现在我只获得估计的人口,我想这是因为有两个 OPTIONAL 子句可以匹配?人口没道理,但我无法进一步解决。

Now the problem that I have is that some of the countries have a dbpprop:populationEstimate property but others have dbpprop:populationCensus and I don't know how to get both of them to bind ?population. As it is now I only get the estimate population, I guess it is because having two OPTIONAL clauses to match ?population doesn't make sense, but I can't get any closer to the solution.

例如印度具有 dbpprop:populationCensus ,但它不会出现在结果中。

For example India have dbpprop:populationCensus, but it doesn't appear in the result.

PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX yago:<http://dbpedia.org/class/yago/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX category: <http://dbpedia.org/resource/Category:>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?name ?population
WHERE {
    ?country a dbo:Country .
    ?country rdfs:label ?enName .   

    OPTIONAL {?country dbpprop:populationEstimate ?population}
    OPTIONAL {?country dbpprop:populationCensus ?population}
    OPTIONAL {?country dbpprop:yearEnd ?yearEnd}

    { ?country dbpprop:iso3166code ?code . }
    UNION
    { ?country dbpprop:iso31661Alpha ?code . }
    UNION
    { ?country dbpprop:countryCode ?code . }
    UNION
    { ?country a yago:MemberStatesOfTheUnitedNations . }

    FILTER (langMatches(lang(?enName), "en")) 
    FILTER (!bound(?yearEnd))
    FILTER (xsd:integer(?population))
    BIND (str(?enName) AS ?name)
}

谢谢大家的帮助:)

推荐答案

首先,我将使用 DBpedia SPARQL端点,以便我们可以复制和粘贴查询。我认为唯一的区别是 dbo 现在将是 dbpedia-owl 。其次,您正在使用许多原始数据属性,但如果可以的话,您应该尝试使用本体中的属性,如这个答案。不一定会影响到您得到的结果,但是如果使用本体属性,通常可以得到更干净的数据。

First, I'm going to use the prefixes defined in the DBpedia SPARQL endpoint so that we can copy and paste queries. I think the only difference is that dbo will now be dbpedia-owl. Second, you're using a number of raw data properties, but if you can, you ought to try to use properties from the ontology, as explained in this answer. That doesn't necessarily affect the results you're getting here, but you'll generally get cleaner data if you use the ontology properties.

让我们先清理一下查询,然后倾向于获取各种人口属性的问题。删除有结束日期的国家可以更简单一些。代替

Let's clean up the query a little bit first, and then tend to the question of the getting the various population properties. Removing countries that have an end date can be done a bit more simply. Instead of

OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
FILTER (!bound(?yearEnd))

您可以使用 不存在过滤器 使其更加直接:

you can use FILTER NOT EXISTS to make this a bit more direct:

FILTER NOT EXISTS { ?country dbpprop:yearEnd ?yearEnd }

在尝试使用DBpedia本体的属性优先于Raw Infobox数据属性时,您可能需要考虑使用 dbpedia-owl:dissolutionYear 而不是 dbpprop:yearEnd ,给出:

In an attempt to use properties from the DBpedia ontology in preference to Raw Infobox data properties, you might want to consider using dbpedia-owl:dissolutionYear rather than dbpprop:yearEnd, giving:

FILTER NOT EXISTS { ?country dbpedia-owl:dissoluationYear ?yearEnd }



简化语言过滤



可以合理地期望 rdfs:label 的值是文字,而 lang 函数要求其参数为文字,所以哟您实际上不需要将 str(?enName)绑定到?name ;只需在三元模式中绑定?name ,然后检查其语言即可(使用 langMatches )。也就是说,不是

Simplify filtering for languages

It's reasonable to expect rdfs:label values to be literals, and the lang function requires its argument to be a literal, so you don't really need to bind str(?enName) to ?name; it's sufficient just to bind ?name in the triple pattern, and then check its language (which you're doing correctly using langMatches). That is, instead of

?country rdfs:label ?enName .   
FILTER (langMatches(lang(?enName), "en")) 
BIND (str(?enName) AS ?name)

您可以只使用

?country rdfs:label ?name .   
FILTER (langMatches(lang(?name), "en"))

表示您返回的名称将带有语言标签。如果您真的只想使用纯字符串,则可以像以前一样绑定,或在select中创建 as 表达式,例如

This does mean that the name you get back will have a language tag. If you really just want the plain string, you can either BIND as you did before, or make an as expression in the select, e.g.,

SELECT DISTINCT (str(?name) as ?noLangName) ?population



检查人口是否已绑定且是数字



我不认为过滤 xsd:integer( (人口)会对您有很大帮助。该符号不是类型谓词,而是强制转换函数,因此?population 被强制转换为整数,我认为过滤器将始终允许值通过,除了如果 0 会失败。不过,您仍然想知道一个国家的人口是否 0 ,对吗?但是,您只希望有人口的国家/地区,因此可以使用 bound

Checking that population is bound and is a number

I don't think filtering on xsd:integer(?population) will do much for you either. That notation isn't a type predicate, but a casting function, so ?population is being cast as an integer, and I think the filter will always let the value through, except in the case of 0, which would fail. You'd still want to know if a country has a population of 0 though, right? However, you do only want countries with populations, so you could filter with bound:

FILTER(bound(?population))

但是,由于此处的属性是原始信息框属性,因此数据中存在一些干扰,因此我们结束

However, since the properties here are raw infobox properties, there is some noise in the data, so we end up with values like

"Denmark"@en "- Density 57,695"@en
"Denmark"@en "- Faroe Islands"@en

没什么用。更好的过滤器将只检查该值是否为数字(这将隐式要求将其绑定),并且有一个函数 isNumeric 就是为了这个目的,因此我们使用:

which aren't useful. A better filter would just check that the value is a number (which will implicitly require that it's bound), and there is a function isNumeric for just that purpose, so we use:

FILTER (isNumeric(?population))



带有VALUES的UNION模式



您可以使用UNION 模式www.w3.org/TR/sparql11-query/#inline-data rel = nofollow noreferrer> VALUES 。可以定义一个变量?hasCode 来代替 UNION 几个几乎相同的模式,而只需使用值 dbpprop:iso3166code 等。例如,代替:

Simplifying similar UNION patterns with VALUES

You can clean up the UNION pattern by using VALUES. Instead of UNIONing several almost identical patterns, you can define a variable ?hasCode that will only have the values dbpprop:iso3166code, etc. I.e., instead of:

{ ?country dbpprop:iso3166code ?code . }
UNION
{ ?country dbpprop:iso31661Alpha ?code . }
UNION
{ ?country dbpprop:countryCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }

您可以使用:

values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
{ ?country ?hasCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }

您可以对?人口检索:

OPTIONAL {?country dbpprop:populationEstimate ?population}
OPTIONAL {?country dbpprop:populationCensus ?population}

可以成为:

values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
OPTIONAL { ?country ?hasPopulation ?population }



最终结果



现在重写的查询为:

The final result

The rewritten query is now:

SELECT DISTINCT ?name ?population
WHERE {
    ?country a dbpedia-owl:Country .
    ?country rdfs:label ?name .   
    FILTER (langMatches(lang(?name), "en")) 

    values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
    OPTIONAL { ?country ?hasPopulation ?population }
    FILTER (isNumeric(?population))

    FILTER NOT EXISTS { ?country dbpedia-owl:dissolutionYear ?yearEnd }

    values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
    { ?country ?hasCode ?code . }
    UNION
    { ?country a yago:MemberStatesOfTheUnitedNations . }
}

SPARQL结果

印度现在显示在人口总数中:

India now appears in the results with a population:

"India"@en 1210193422

这篇关于SPARQL查询以从DBpedia中检索国家/地区人口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆