获取所有Wikipedia信息框模板和使用它们的所有页面 [英] Get all Wikipedia Infobox Templates and all Pages using them

查看:128
本文介绍了获取所有Wikipedia信息框模板和使用它们的所有页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出Wikipedia页面,例如 Wikipedia:堆栈溢出通常会有信息框(大部分位于右侧)手放在页面顶部).屏幕截图示例:

Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot:

  1. DBPedia将所有这些属性列出为RDF三元组.您可以在 DBPedia:堆栈溢出中看到该示例.在那里,您看到具有值dbpedia:Template:Infobox_website的属性dbpprop:wikiPageUsesTemplate,这很有趣.我想知道哪些维基百科页面使用此模板.我该怎么做并列出所有使用Infobox_website模板的页面?最好使用SPARQL查询,但我愿意接受其他简单的解决方案.

  1. DBPedia lists all these attributes as RDF triples. You can see the example at DBPedia: Stack Overflow. There you see the property dbpprop:wikiPageUsesTemplate with the value dbpedia:Template:Infobox_website which is interesting. I want to know which Wikipedia pages use this template. How can i do that and list all pages which use the Infobox_website template? Preferably with a SPARQL query but i am open to other easy solutions.

下一步是所有信息框模板的列表. Wikipedia:类别信息框模板显示了所需Wikipedia类别的层次结构-看起来像我正在寻找.但是我希望所有这些都以机器可读的格式出现在一页上.也许DBPedia在这里也是正确的事情?在 DBPedia:类别Infox模板

Next thing is a list of all Infobox Templates. Wikipedia: Category Infobox Templates shows the hierarchy of the desired Wikipedia categories - that looks like what i am seeking. But i want all of these in a machine readable format, on one page. Maybe DBPedia is the right thing here too? At DBPedia: Category Infox Templates and DBPedia: INFOBOX i find very few information. But these are looking very promising. How can i use SPARQL to find all Infobox Types so that i can do step 1 repeatedly for each of them?

您可以使用它来测试SPARQL查询: http://dbpedia.org/snorql/

You can use this for testing the SPARQL queries: http://dbpedia.org/snorql/

我似乎已经解决了问题1:

I seem to have solved problem number 1: SPARQL: list all pages with Infobox_website

此外,这似乎是对问题2的查询:

Also, this seems to be the query for problem number 2: SPARQL: list all Infoboxes

推荐答案

以前的答案似乎已经停止起作用.仅需进行很小的更改即可使它们在 http://live.dbpedia.org/上的新dbpedia查询端点上工作. sparql .

The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

要获取所有页面及其使用此查询的模板的列表,请执行以下操作:

To get a list of all of the pages and the templates that they use this query works:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

如果您正在寻找特定的模板:

If you're looking for a specific template:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

在我的用例中,我对Wikipedia URL感兴趣,而不对DBPedia页面感兴趣,所以我正在使用此查询:

And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

我还使用curl将结果提取到脚本中:

I'm also using curl to pull the results into a script:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

我不确定这是否会给出完整的结果集,因为它返回1698个结果,而

I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.

对于您问题的第二部分,只需对上一个查询进行很小的更改即可获得所有模板的列表:

For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆