如何从Wikipedia中获取纯文本 [英] How to get plain text out of Wikipedia

查看:118
本文介绍了如何从Wikipedia中获取纯文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个仅获取Wikipedia描述部分的脚本.也就是说,当我说

I'd like to write a script that gets the Wikipedia description section only. That is, when I say

/wiki bla bla bla

它将转到 bla bla bla 的维基百科页面,获得以下内容,然后将其返回到聊天室:

it will go to the Wikipedia page for bla bla bla, get the following, and return it to the chatroom:

"Bla Bla Bla"是一首歌的名字 由Gigi D'Agostino制造.他描述 这首歌是我写过的一首歌 在所有说话的人中 一言不发". 突出但无意义的声音 样本取自英国乐队 舒展的歌曲你为什么这么做"

"Bla Bla Bla" is the name of a song made by Gigi D'Agostino. He described this song as "a piece I wrote thinking of all the people who talk and talk without saying anything". The prominent but nonsensical vocal samples are taken from UK band Stretch's song "Why Did You Do It"

我该怎么做?

推荐答案

使用 MediaWiki API ,该文件可在Wikipedia上运行.您将必须自己对数据进行一些解析.

Use the MediaWiki API, which runs on Wikipedia. You will have to do some parsing of the data yourself.

例如:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&&titles=Bla%20Bla%20Bla

手段

以JSON格式(format = json)获取(action = query)主页(title = Main%20Page)的最新修订版的内容(rvprop = content).

fetch (action=query) the content (rvprop=content) of the most recent revision of Main Page (title=Main%20Page) in JSON format (format=json).

您可能需要搜索查询并使用第一个结果来处理拼写错误等.

You will probably want to search for the query and use the first result, to handle spelling errors and the like.

这篇关于如何从Wikipedia中获取纯文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆