如何使用正则表达式从HTML获取数据 [英] How to get data from HTML using regex
本文介绍了如何使用正则表达式从HTML获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下HTML
< table class =profile-stats>
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>
我想从< td class =stat stat- <> code>< div class =statnum>
= 22
。
我试过了下面的正则表达式,但没有找到任何匹配的结果。
/< div \sclass =statnum> ^(。)\?< \ / div> / ig
解决方案
以下是使用解析器完成此操作的一种方法。
<?php
$ html ='< table class =profile-stats>
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>';
$ doc = new DOMDocument(); //创建一个dom对象
$ doc-> loadHTML($ html);
$ tds = $ doc-> getElementsByTagName('td');
foreach($ tds as $ cell){//遍历所有单元格
if(strpos($ cell-> getAttribute('class'),'stat-last')){
$ divs = $ cell-> getElementsByTagName('div');
foreach($ divs as $ div){//遍历单元格的所有div
if($ div-> getAttribute('class')=='statnum'){
echo $ div-> nodeValue;
code
输出:
22
...或使用xpath ...
$ doc = new DOMDocument(); //创建一个dom对象
$ doc-> loadHTML($ html);
$ xpath = new DOMXpath($ doc);
$ statnums = $ xpath-> query(// td [@ class ='stat stat-last'] / a / div [@ class ='statnum']);
foreach($ statnums as $ statnum){
echo $ statnum-> nodeValue;
}
输出:
22
或者如果您真的想要将它正则表达...
<?php
$ html ='< table class =profile-stats>
< tr>
< td class =stat>
< div class =statnum> 8< / div>
< div class =statlabel>推文< / div>
< / td>
< td class =stat>
< a href =/ THEDJMHA / following>
< div class =statnum> 13< / div>
< div class =statlabel>关注< / div>
< / a>
< / td>
< td class =stat stat-last>
< a href =/ THEDJMHA / followers>
< div class =statnum> 22< / div>
< div class =statlabel>关注者< / div>
< / a>
< / td>
< / tr>
< / table>';
preg_match('〜td class =。*?stat-last>。*?< div class =statnum>(。*?)<〜s',$ html,$ num );
echo $ num [1];
输出:
正则表达式演示: https://regex101.com/r/kM6kI2/1
I have following HTML
<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>
I want to get value from <td class="stat stat-last">
=> <div class="statnum">
= 22
.
I have tried the follow regex but does not any found match.
/<div\sclass="statnum">^(.)\?<\/div>/ig
解决方案 Here's a way to accomplish this using a parser.
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
if(strpos($cell->getAttribute('class'), 'stat-last')){
$divs = $cell->getElementsByTagName('div');
foreach($divs as $div) { // loop through all divs of the cell
if($div->getAttribute('class') == 'statnum'){
echo $div->nodeValue;
}
}
}
}
Output:
22
...or using an xpath...
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
echo $statnum->nodeValue;
}
Output:
22
or if you realllly wanted to regex it...
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];
Output:
22
Regex demo: https://regex101.com/r/kM6kI2/1
这篇关于如何使用正则表达式从HTML获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文