使用getlementbyclass名称或getlementbytag来从html内容中删除数据 [英] Using getlementbyclass name or getlementbytag to scrap data from html content

查看:326
本文介绍了使用getlementbyclass名称或getlementbytag来从html内容中删除数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里我从网页上获取了源代码狙击手: http:// www.yelp.com/biz/franchino-san-francisco?start=80

Here I have taken source code snipper from webpage : http://www.yelp.com/biz/franchino-san-francisco?start=80.

我想为每个街区的日期,评论,

I want to scrap date, review, rate for each block on the page.

@: http://ideone.com / fork / Yfw2re

我不太熟悉DOM元素,感谢有人可以更正这个

I am not much familiar with DOM element, I appreciate if someone can correct this

这是代码:

<?php

// your code goes here    
$html = <<< EOF    
<div class="review-wrapper">
           <div class="review-content">
        <div class="biz-rating biz-rating-very-large clearfix">
    <div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">

    <div class="rating-very-large">
    <i class="star-img stars_5" title="5.0 star rating">
        <img alt="5.0 star rating" class="offscreen" height="303" src="http://s3-media3.ak.yelpcdn.com/assets/2/www/img/c2252a4cd43e/ico/stars/v2/stars_map.png" width="84">
    </i>
        <meta itemprop="ratingValue" content="5.0">
</div>   
    </div>
        <span class="rating-qualifier">
        <meta itemprop="datePublished" content="2013-10-28">
    10/28/2013
</span>    
</div>   
            <p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and &#39;home-cooking&#39;. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p>     
</div>
<div class="review-footer clearfix">
               <div class="rateReview ufc-feedback clearfix" data-review-id="SnZ4Q97nJdR7a-fot-Slcw">
                <p class="review-intro review-message">
    Was this review &hellip;?
</p>
EOF;


$dom = new DOMDocument();
@$dom->loadHTML($html);    
$classname = 'review-content'
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$tmp_dom = new DOMDocument();      
foreach($nodes as $result) {

  //getting rate value from '<meta itemprop="ratingValue" content="5.0">'
  //getting  date from <span class="rating-qualifier">       <meta itemprop="datePublished" content="2013-10-28">     10/28/2013 </span>
  //getting review from  ' <p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and &#39;home-cooking&#39;. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p> '

}


推荐答案

你可以循环通过值或标签这样的名称:

you can loop through the class values or tag names like this :

$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}


$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}

$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');

您可以显着地循环评分部分,使用简单的 for loop。

you can obviously loop the rating part to get all the ratings on the page using a simple for loop.

演示这里: https://eval.in/143036

这篇关于使用getlementbyclass名称或getlementbytag来从html内容中删除数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆