如何从维基百科页面的最新修订版获取内部链接? [英] How to get internal link from latest revision of a wikipedia page?

查看:127
本文介绍了如何从维基百科页面的最新修订版获取内部链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Wikipedia页面提取内部链接.这是我正在使用的查询

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=pageTitle

但是,结果不能反映Wiki页面上的内容.例如在此处随机发的文章.此页面上只有十几个链接.但是,当我进行查询时,

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=Von_Mises%E2%80%93Fisher_distribution

我获得了187个链接.我猜想API可能有一个数据库,其中包含所有添加到页面的链接,包括所有修订.是这样吗如何仅从上一个修订版本获得链接?

解决方案

数据库在文章的当前版本中具有正确的链接列表.实际上,您从API获得的所有链接都在本文中.但是,它们大多数都隐藏在底部的(两次折叠)导航框中(滚动到底部,单击蓝色栏上的显示",然后单击现在看到的其他蓝色栏上的显示").

请注意,这些链接位于页面上,但未在Wikitext中定义-它们来自{{ProbDistributions}}导航模板(以及该模板依次包含的模板).

遗憾的是,由于模板替换发生在Wiki语法的实际解析之前,因此没有一种好的方法仅列出页面上直接/明确定义的链接.

I'm trying to extract internal links from wikipedia pages. This is the query I'm using

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=pageTitle

However, the result does not reflect what's on the wiki page. Take for example a random article here. There are only a dozen of links on this page. However, when I make the query,

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=Von_Mises%E2%80%93Fisher_distribution

I got back 187 links. I guess the API might has a database of all the links that have ever added to the page including all the revisions. Is that the case? How can I get the links from only the last revision?

解决方案

The database has the correct list of the links in the current version of the articles. All the links you get from the API are in fact in the article. However, most of them are hidden in the (twice collapsed) navigation box at the bottom (scroll to the bottom, click "show" on the blue bar, then click "show" on the additional blue bars you now see).

Note that these links are on the page, but not defined in the wikitext - they come from the {{ProbDistributions}} navigation template (and the template that template in turn includes).

Sadly, there is no good way to list only the links that are directly/explicitly defined on a page, since template substitution happens before the actual parsing of the wiki syntax.

这篇关于如何从维基百科页面的最新修订版获取内部链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆