如何使用正则表达式从HTML获取数据 [英] How to get data from HTML using regex

查看:101
本文介绍了如何使用正则表达式从HTML获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下HTML

 < table class =profile-stats> 
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>

我想从< td class =stat stat- <> code>< div class =statnum> = 22

我试过了下面的正则表达式,但没有找到任何匹配的结果。

  /< div \sclass =statnum> ^(。)\?< \ / div> / ig 


解决方案

以下是使用解析器完成此操作的一种方法。

 <?php 
$ html ='< table class =profile-stats>
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>';
$ doc = new DOMDocument(); //创建一个dom对象
$ doc-> loadHTML($ html);
$ tds = $ doc-> getElementsByTagName('td');
foreach($ tds as $ cell){//遍历所有单元格
if(strpos($ cell-> getAttribute('class'),'stat-last')){
$ divs = $ cell-> getElementsByTagName('div');
foreach($ divs as $ div){//遍历单元格的所有div
if($ div-> getAttribute('class')=='statnum'){
echo $ div-> nodeValue;




code


输出:


22


...或使用xpath ...

  $ doc = new DOMDocument(); //创建一个dom对象
$ doc-> loadHTML($ html);
$ xpath = new DOMXpath($ doc);
$ statnums = $ xpath-> query(// td [@ class ='stat stat-last'] / a / div [@ class ='statnum']);
foreach($ statnums as $ statnum){
echo $ statnum-> nodeValue;
}

输出:


22


或者如果您真的想要将它正则表达...

 <?php 
$ html ='< table class =profile-stats>
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>';
preg_match('〜td class =。*?stat-last>。*?< div class =statnum>(。*?)<〜s',$ html,$ num );
echo $ num [1];

输出:


正则表达式演示: https://regex101.com/r/kM6kI2/1


I have following HTML

<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>

I want to get value from <td class="stat stat-last"> => <div class="statnum"> = 22.

I have tried the follow regex but does not any found match.

/<div\sclass="statnum">^(.)\?<\/div>/ig

解决方案

Here's a way to accomplish this using a parser.

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
    if(strpos($cell->getAttribute('class'), 'stat-last')){
        $divs = $cell->getElementsByTagName('div');
        foreach($divs as $div) { // loop through all divs of the cell
            if($div->getAttribute('class') == 'statnum'){
                echo $div->nodeValue;
            }
        }
    }
}

Output:

22

...or using an xpath...

$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
    echo $statnum->nodeValue;
}

Output:

22

or if you realllly wanted to regex it...

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];

Output:

22

Regex demo: https://regex101.com/r/kM6kI2/1

这篇关于如何使用正则表达式从HTML获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆