PHP + Wikipedia:从Wikipedia文章的第一段中获取内容? [英] PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article?

查看:103
本文介绍了PHP + Wikipedia:从Wikipedia文章的第一段中获取内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Wikipedia的API(api.php)来获取由链接提供的Wikipedia文章的内容(例如: http://en.wikipedia.org/wiki/Stackoverflow )。
我想要的是获得第一段(在Stackoverflow Wiki文章的示例中为: Stack Overflow是Stack Exchange网络[2] [3]的网站部分,具有问题和解答[4] [5] [6] )。

I’m trying to use Wikipedia’s API (api.php) to get the content of a Wikipedia article provided by a link (like: http://en.wikipedia.org/wiki/Stackoverflow). And what I want is to get the first paragraph (which in the example of the Stackoverflow wiki article is: Stack Overflow is a website part of the Stack Exchange network[2][3] featuring questions and answers on a wide range of topics in computer programming.[4][5][6]).

我将使用

我尝试使用API​​网址: http://en.wikipedia.org/w/api.php?action=parse&page=Stackoverflow&format=xml 但这给了我某种错误。它会输出:

I’ve tried with the API url: http://en.wikipedia.org/w/api.php?action=parse&page=Stackoverflow&format=xml but it gives me some kind of error. It outputs:

<api>
<parse displaytitle="Stackoverflow" revid="289948401">
<text xml:space="preserve">
<ol> <li>REDIRECT <a href="/wiki/Stack_Overflow" title="Stack Overflow">Stack Overflow</a></li> </ol> <!-- NewPP limit report Preprocessor node count: 1/1000000 Post-expand include size: 0/2048000 bytes Template argument size: 0/2048000 bytes Expensive parser function count: 0/500 --> <!-- Saved in parser cache with key enwiki:pcache:idhash:21772484-0!*!0!!*!* and timestamp 20110525165333 -->
</text>
<langlinks/>
<categories/>
<links>
<pl ns="0" exists="" xml:space="preserve">Stack Overflow</pl>
</links>
<templates/>
<images/>
<externallinks/>
<sections/>
</parse>
</api>

我发现了我尝试过的这段代码

I found this snippet of code that I’ve tried

$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[@id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP; 

但我无法在$ wikiPage变量中获取HTML内容。

but I can’t get the HTML content in the variable $wikiPage.

我不知道这是最好还是最理想的方式,因此请随时对此发表评论,否则任何建议或解决方案将不胜感激。

I do not know if this is the best or most optimal way to do it so please feel free to comment on that and otherwise any suggestion or solutions would be very appreciated.

谢谢

-Mestika

Thank you
- Mestika

推荐答案

重定向页面的内容。

API确实支持& redirects选项,该选项将为您解决重定向。

The API does have support for an &redirects option, which will resolve redirects for you.

这篇关于PHP + Wikipedia:从Wikipedia文章的第一段中获取内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆