如何显示从外部网站抓取的内容? [英] How do I display content grabbed from external websites?

查看:132
本文介绍了如何显示从外部网站抓取的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从外部网站获取内容并将其显示在我的网站上?

How do I grab pieces of content from external websites and display them on my website? (Similar to what an RSS feed or other aggregator does).

例如,假设我要显示来自其他网站日历的项目:

For example, say I want to display items from another website's calendar:

其他网站

<h1>Here's our calendar:</h1>

<div class="calendar_item">
  <h2>Boston Marathon</h2>
  <p class="date">June 23, 2012</p>
  <p class="description">This marathon is 26.2 miles and lots of fun.</p>
</div>

<div class="calendar_item">    
  <h2>Irish Pub Crawl</h2>
  <p class="date">July 17, 2012</p>
  <p class="description">Shamrocks and green things are super-fun.</p>
</div>

<div class="calendar_item">
  <h2>Tim's Birthday</h2>
  <p class="date">August 25, 2012</p>
  <p class="description">It's Tim's birthday, yo.</p>
</div>

我的网站

<h1>Here's a feed of some calendar items from someone else's website:</h1>

<div class="event_title">Boston Marathon</div>
<div class="event_date">June 23, 2012</div>
<div class="event_description">This marathon is 26.2 miles and lots of fun.</div>

<div class="event_title">Irish Pub Crawl</div>
<div class="event_date">July 17, 2012</div>
<div class="event_description">Shamrocks and green things are super-fun.</div>

<div class="event_title">Tim's Birthday</div>
<div class="event_date">August 25, 2012</div>
<div class="event_description">It's Tim's birthday, yo.</div>

这是我试过的(使用MAMP):

Here's what I've tried (using MAMP):

<?php

$url = "http://example.com";

$page = curl($url);

$pattern = '%
<h2>(.+?)</h2>
%i';

preg_match($pattern,$page,$matches);

print_r($matches);

?>

...打印:

Array ( )

教程/我看过包括模糊的答案,如try cURL。

The tutorials/etc. I've viewed include ambiguous answers like "try cURL". This seems so simple, but I'm a stumped noob.

提前感谢:)

推荐答案

我不建议regex解析HTML。 PHP 5+附带一个解析器,您可以使用如下所示。

I would not recommend regex for parsing HTML. PHP 5+ comes with a parser which you can use as shown below.

$content = file_get_contents('test.html');
$doc = 
<<<DOC
$content
DOC;
$dom = new DOMDocument();
$dom->loadHTML($doc);
$h2Tags = $dom->getElementsByTagName("h2");
$pTags = $dom->getElementsByTagName("p");
foreach($h2Tags as $h2 ) {
    //do something
}

foreach($pTags as $p ) {
if($p->getAttribute("class") == "date") {
    //do something
}

}

$ h2的类型为DOMElement。它继承DOMNode。因此,您可以使用nodeValue属性来访问这些值。在上面的例子中,你可以写$ h2-> nodeValue来访问内容。

$h2 is of type DOMElement. It inherits DOMNode. So you can use nodeValue property to access the values. In the above example, you can write $h2->nodeValue to access the content.

这篇关于如何显示从外部网站抓取的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆