如何从维基百科的文章中提取数据? [英] How to extract data from a Wikipedia article?

查看:1264
本文介绍了如何从维基百科的文章中提取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于我的Andr​​oid应用程序从维基百科的分析数据的问题。我有一个可以通过阅读源下载XML脚本<$c$c>http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (也是JSON替换格式= XML 格式= JSON

但我不能弄清楚是怎么只能从目录中访问某些章节。我要的是当网页加载,用户可以preSS一个按钮,使一个弹出窗口出现,从目录中显示的标题并允许用户读取那块它,只有那件为了方便,我有点摇摇欲坠使用JSON但有可能这样做吗?或者说,有没有从维基百科的API,允许开发人员只能查看某些部分页面?

I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json.

But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays the headers from the table of contents and allow the user to read that piece and only that piece of it for convenience. I'm a little shaky with JSON but is it possible to do this? Or, is there an API from Wikipedia that allows the developer to only view certain parts of a page?

谢谢!

推荐答案

很遗憾,似乎的mediawiki.org文档为解析 不会告诉你如何做到这一点。但 API本身确实在文档:您可以使用部分参数。您还可以使用道具=节来得到部分的清单。

Unfortunatelly, it seems the mediawiki.org documentation for parse doesn't tell you how to do this. But the documentation in the API itself does: You can use section parameter. And you can use prop=sections to get the list of sections.

所以,你可以先使用:

<一个href="http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections">http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections

获得部分的列表,然后

<一个href="http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text&section=26">http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text&section=26

要获取HTML的某一部分。

to get the HTML for a certain section.

这篇关于如何从维基百科的文章中提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆