查询具有属性的Wikipedia页面 [英] Query Wikipedia pages with properties

查看:136
本文介绍了查询具有属性的Wikipedia页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Wikipedia API Query或任何其他api(例如Opensearch)来查询具有某些属性的简单页面列表.

I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.

输入:页面(文章)标题或ID的列表.
输出:每个页面包含以下属性的列表:
页面ID
标题
代码段/说明(例如,在opensearch api中)
页面网址
图片网址(如opensearch api中的图片)

Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)

类似于以下结果:
http://en .wikipedia.org/w/api.php?action = opensearch& search = miles%20davis& limit = 20& format = xml
仅带有页面ID,而不是用于搜索,而是带有标题或页面ID的确切页面列表.

A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.

这应该是一件相当简单的事情,但是我坚持了一段时间,尝试了MW api手册中的各种URL组合,但没有成功.

This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.

推荐答案

除了 Open Search API 来获取Open Search数据,但是取决于您对哪个Wikipedia感兴趣,可能会安装其他扩展程序来为您提供帮助.以英语维基百科为例,我们可以使用 MobileFrontend PageImages 扩展名,恰好是

I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.

  • 标题 url 可从本机MediaWiki API获得.要获取该网址,您可以使用 prop=info 并指定inprop=url这是您感兴趣的URL.
  • 由于PageImages,prop=pageimages返回了页面的突出图像.
  • MobileFrontend添加一个名为extracts的属性,您可以将其与指令exintro结合使用以获取第一段.但是请注意,MediWiki标记很复杂,结果可能并不总是完美的.如果我们将所有内容放在一起进行单个查询,则将是这样的:
  • Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in.
  • Prominent images of a page is returned by prop=pageimages, thanks to PageImages.
  • MobileFrontend adds a property called extracts, that you can use with the directive exintro to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:

为此:

<api>
  <query>
    <pages>
      <page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&amp;action=edit">
        <thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
        <extract xml:space="preserve">
             &lt;p&gt;&lt;b&gt;Nairobi&lt;/b&gt; /naɪˈroʊbi/ is the [...]
        </extract>
      </page>
    </pages>
  </query>
</api>

这篇关于查询具有属性的Wikipedia页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆