使用属性查询维基百科页面 [英] Query Wikipedia pages with properties

查看:25
本文介绍了使用属性查询维基百科页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 Wikipedia API Query 或任何其他 API(例如 Opensearch)来查询具有某些属性的简单页面列表.

I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.

输入:页面(文章)标题或 ID 列表.
输出:一个页面列表,每个页面包含以下属性:
页面 ID
标题
片段/描述(如在 opensearch api 中)
页面网址
图片网址(如在 opensearch api 中)

Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)

类似这样的结果:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
仅包含页面 ID,而不是用于搜索,而是按标题或 pageid 列出的准确页面列表.

A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.

这应该是一件相当简单的事情,但我已经坚持了很长一段时间,尝试了 MW api 手册中的各种 URL 组合,但没有成功.

This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.

推荐答案

我认为除了 Open Search API 以获取 Open Search 数据,但根据您感兴趣的维基百科,可能会安装其他扩展程序来帮助您.以英文维基百科为例,我们可以利用MobileFrontendPageImages 扩展,发生在 安装在那里.

I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.

  • Titleurl 可从本地 MediaWiki API 获得.要获取网址,您可以使用 prop=info,并用 inprop=url 指定它是您感兴趣的 url.
  • 页面的突出图像prop=pageimages 返回,感谢 PageImages.
  • MobileFrontend 添加了一个名为 extracts 的属性,您可以将其与指令 exintro 结合使用以获取第一段.但是请注意,MediWiki 标记很复杂,结果可能并不总是完美的.如果我们将所有内容放在一个查询中,它将是这样的:
  • Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in.
  • Prominent images of a page is returned by prop=pageimages, thanks to PageImages.
  • MobileFrontend adds a property called extracts, that you can use with the directive exintro to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:

http://en.wikipedia.org/w/api.php?action=query&pageids=21482&prop=pageimages|info|extracts&inprop=url&exintro

给这个:

<api>
  <query>
    <pages>
      <page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&amp;action=edit">
        <thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
        <extract xml:space="preserve">
             &lt;p&gt;&lt;b&gt;Nairobi&lt;/b&gt; /naɪˈroʊbi/ is the [...]
        </extract>
      </page>
    </pages>
  </query>
</api>

这篇关于使用属性查询维基百科页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆