如何获取维基百科页面中的所有 URL [英] How to get all URLs in a Wikipedia page

查看:60
本文介绍了如何获取维基百科页面中的所有 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎维基百科 API 对链接的定义与 URL 不同?我正在尝试使用 API 返回特定 wiki 页面中的所有 url.

It seems like Wikipedia API's definition of a link is different from URL? I'm trying to use the API to return all the urls in a specific wiki page.

我一直在玩 此页面下找到的nofollow">此查询生成器和重定向.

I have been playing around with this query that I found from this page under generators and redirects.

推荐答案

我不知道你到底为什么会感到困惑(如果你解释一下会有所帮助),但我很确定那个查询不是你想要的.它列出了从页面标题"(titles=Title)链接(generator=links)的页面上的链接(prop=links).它还只列出第一页链接上的第一页链接(页面大小默认值为 10).

I'm not sure why exactly are you confused (it would help if you explained that), but I'm quite sure that query is not what you want. It lists links (prop=links) on pages that are linked (generator=links) from the page "Title" (titles=Title). It also lists only the first page of links on the first page of links (with page size the tiny default value of 10).

如果你想获得页面标题"上的所有链接:

If you want to get all the links on the page "Title":

  1. 仅使用 prop=links,您不需要生成器.
  2. 通过添加 pllimit=max(pllinks 的前缀")将限制增加到可能的最大值
  3. 使用 query-continue 元素中给定的值可访问第二页(及后续)结果页.
  1. Use just prop=links, you don't want the generator.
  2. Increase the limit to the maximum possible by adding pllimit=max (pl is the "prefix" for links)
  3. Use the value given in the query-continue element to get to the second (and following) page of results.

因此,第一页的查询将是:

So, the query for the first page would be:

http://en.wikipedia.org/w/api.php?action=query&titles=Title&prop=links&pllimit=max

第二页(在本例中为最后一页):

And the second (and in this case, final) page:

http://en.wikipedia.org/w/api.php?action=query&titles=Title&prop=links&pllimit=max&plcontinue=226160|0|Lieutenant_General

另一件可能会让您感到困惑的事情是 links 仅返回内部链接(到其他维基百科页面).要获取外部链接,请使用 prop=extlinks.您也可以将两者合并为一个查询:

Another thing that might be confusing you is that links returns only internal links (to other Wikipedia pages). To get external links, use prop=extlinks. You can also combine the two into one query:

http://en.wikipedia.org/w/api.php?action=query&titles=Title&prop=links|extlinks

这篇关于如何获取维基百科页面中的所有 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆