从维基百科文章中获取摘录? [英] Fetch excerpt from Wikipedia article?

查看:35
本文介绍了从维基百科文章中获取摘录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在维基百科API上下,但我不能t 弄清楚是否有一种不错的方式来获取文章的摘录(通常是第一段).获得该段落的 HTML 格式也很好.

I've been up and down the Wikipedia API, but I can't figure out if there's a nice way to fetch the excerpt of an article (usually the first paragraph). It would be nice to get the HTML formatting of that paragraph, too.

我目前看到的获得类似片段的唯一方法是执行全文搜索(example),但这并不是我真正想要的(太短了).

The only way I currently see of getting something that resembles a snippet is by performing a fulltext search (example), but that's not really what I want (too short).

除了粗暴地解析 HTML/WikiText 之外,还有其他方法可以获取维基百科文章的第一段吗?

Is there any other way to fetch the first paragraph of a Wikipedia article than barbarically parsing HTML/WikiText?

推荐答案

我发现通过 API 无法做到这一点,所以我求助于解析 HTML,使用 PHP 的 DOM 函数.这很简单,包括以下几行:

I found no way of doing this through the API, so I resorted to parsing HTML, using PHP's DOM functions. This was pretty easy, something among the lines of:

$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[@id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP; // echo the first paragraph of the wiki article, including <p></p>

这篇关于从维基百科文章中获取摘录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆