从维基百科的文章节选取? [英] Fetch excerpt from Wikipedia article?
问题描述
我一直在向上和向下的维基百科的API ,但我想不通,如果有一个的很好的方式来获取文章(通常是第一款)的摘录。这将是很好得到该段的HTML格式了。
I've been up and down the Wikipedia API, but I can't figure out if there's a nice way to fetch the excerpt of an article (usually the first paragraph). It would be nice to get the HTML formatting of that paragraph, too.
我目前看到得到的东西,类似于一个片段的唯一方法是通过执行全文检索(<一个href=\"http://en.wikipedia.org/w/api.php?format=xmlfm&action=query&list=search&srsearch=Fight+Club&srlimit=1\">example),但是这不是我真正想要什么(太短)。
The only way I currently see of getting something that resembles a snippet is by performing a fulltext search (example), but that's not really what I want (too short).
是否有任何其他方式来获取维基百科文章不是野蛮解析HTML / wikitext的?
Is there any other way to fetch the first paragraph of a Wikipedia article than barbarically parsing HTML/WikiText?
推荐答案
我发现没有通过API这样的方式,所以我使出解析HTML,使用的 PHP的DOM功能的。这是pretty方便,东西线中:
I found no way of doing this through the API, so I resorted to parsing HTML, using PHP's DOM functions. This was pretty easy, something among the lines of:
$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[@id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP; // echo the first paragraph of the wiki article, including <p></p>
这篇关于从维基百科的文章节选取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!