从维基百科的文章节选取? [英] Fetch excerpt from Wikipedia article?

查看:138
本文介绍了从维基百科的文章节选取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在向上和向下的维基百科的API ,但我想不通,如果有一个的很好的方式来获取文章(通常是第一款)的摘录。这将是很好得到该段的HTML格式了。

I've been up and down the Wikipedia API, but I can't figure out if there's a nice way to fetch the excerpt of an article (usually the first paragraph). It would be nice to get the HTML formatting of that paragraph, too.

我目前看到得到的东西,类似于一个片段的唯一方法是通过执行全文检索(<一个href=\"http://en.wikipedia.org/w/api.php?format=xmlfm&action=query&list=search&srsearch=Fight+Club&srlimit=1\">example),但是这不是我真正想要什么(太短)。

The only way I currently see of getting something that resembles a snippet is by performing a fulltext search (example), but that's not really what I want (too short).

是否有任何其他方式来获取维基百科文章不是野蛮解析HTML / wikitext的?

Is there any other way to fetch the first paragraph of a Wikipedia article than barbarically parsing HTML/WikiText?

推荐答案

我发现没有通过API这样的方式,所以我使出解析HTML,使用的 PHP的DOM功能的。这是pretty方便,东西线中:

I found no way of doing this through the API, so I resorted to parsing HTML, using PHP's DOM functions. This was pretty easy, something among the lines of:

$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[@id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP; // echo the first paragraph of the wiki article, including <p></p>

这篇关于从维基百科的文章节选取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆