如何以编程方式从Wikidata实体获取所有可用信息? [英] how to programmatically get all available information from a Wikidata entity?

查看:81
本文介绍了如何以编程方式从Wikidata实体获取所有可用信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的是wikidata的新手.我只是想知道Wikidata使用了大量的形式化方法.

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.

假设我们想为奥巴马提供所有可用信息.如果我们要从DBpedia做到这一点,我们将只使用一个简单的查询: select *其中{< http://dbpedia.org/resource/Barack_Obama>?p?o.} 这将以奥巴马为主题返回所有属性和值.本质上,结果与以下页面相同: http://dbpedia.org/page/Barack_Obama ,而查询结果的格式则为我所需.

Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query: select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.

我想知道如何使用Wikidata做同样的事情.这是奥巴马的Wikidata页面: https://www.wikidata.org/wiki/Q76 .假设我要此页面上的所有语句.但是,此页面上的几乎所有陈述都经过修饰,因为它们具有等级和限定词等.例如,对于受过教育"部分,它不仅有学校,而且有开始时间"和结束时间"并且所有学校都被列为正常学校,因为奥巴马不再在这些学校中.

I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.

我可以通过获取真实陈述(使用 https://query.wikidata.org ):

I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):

SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
   SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
   }
 }

上面的查询将简单地返回所有学校.

The above query will simple return all the schools.

如果我想获取学校的开始时间和结束时间,则需要执行以下操作:

If I want to get the start time and end time of the school, I need to do this:

SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
   SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
   }
 }

但是问题是,不查看实际页面,我怎么知道?school_statement有pq:P580和pq:P582,即开始时间"和结束时间"?一切都归结为一个问题,我如何从 https://www.wikidata.org/wiki/Q76 获得所有信息(包括验证)?

But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?

最终,我希望有一个这样的表: ||谓词|| object || objectLabel || qualifier1 || qualifier1Value || qualifier2 || qualifier2Value || ...

Ultimately, I would expect a table like this: ||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...

推荐答案

您可能应该使用 Wikidata数据API (更具体地说,是 wbgetentities 模块),而不是SPARQL端点:

you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:

在您的情况下: https://www.wikidata.org/w/api.php?action = wbgetentities& format = json& ids = Q76

您应该找到要查找的所有限定词数据:带有 entities.Q76.claims.P69.1

You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1

{ mainsnak: 
   { snaktype: 'value',
     property: 'P69',
     datavalue: 
      { value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
        type: 'wikibase-entityid' },
     datatype: 'wikibase-item' },
  type: 'statement',
  qualifiers: 
   { P580: 
      [ { snaktype: 'value',
          property: 'P580',
          hash: 'a1db249baf916bb22da7fa5666d426954435256c',
          datavalue: 
           { value: 
              { time: '+1971-01-01T00:00:00Z',
                timezone: 0,
                before: 0,
                after: 0,
                precision: 9,
                calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
             type: 'time' },
          datatype: 'time' } ],
     P582: 
      [ { snaktype: 'value',
          property: 'P582',
          hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
          datavalue: 
           { value: 
              { time: '+1979-01-01T00:00:00Z',
                timezone: 0,
                before: 0,
                after: 0,
                precision: 9,
                calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
             type: 'time' },
          datatype: 'time' } ] },
  'qualifiers-order': [ 'P580', 'P582' ],
  id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
  rank: 'normal' }

那么您可能会很有趣,可以从这些结果中提取可读的结果

Then you might be interesting in ways to extract readable results from those results

这篇关于如何以编程方式从Wikidata实体获取所有可用信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆