仅从php中使用preg_match_all的html表获取数据 [英] Get data only from html table used preg_match_all in php

查看:112
本文介绍了仅从php中使用preg_match_all的html表获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的html表:

I have a html table like this :

<table ... >

  <tbody ... >

       <tr ... > 
             <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
       </tr>
        <tr ... > 
             <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
                <td ...>
                  string...
              </td>
                <td ...>
             </td>
                <td ...>
                  string...
              </td>
       </tr>
       ..............

  </tbody>


</table>

这是一个数据表,我需要从中获取所有数据。
该表格有很多行(< tr>< / tr> )。每行都有一个固定的列(< td>< / td> )(目前为5)。
记住每个表,tr,td标签可能格式化(在哪里说......)

This is a data table and I need to get all data from this. The table will have many rows (<tr></tr>) . each row will have a fixed columns (<td></td>)(currently is 5 ). remember each table,tr,td tag maybe formatted (where say "...")

我希望每个人都可以帮我写一个正则表达式 preg_match_all 获取如下数据的函数:

And I hope everyone can help me to write a regex for preg_match_all function to get the data like this :

array(
   0 => array(
       0=> 'some data0',
       1=> 'some data1',
       2=> 'some data2',
       3=> 'some data3',
       4=> 'some data4',
   )
   1 => array(
       0=> 'some data0',
       1=> 'some data1',
       2=> 'some data2',
       3=> 'some data3',
       4=> 'some data4',
   )
   2 => array(
       0=> 'some data0',
       1=> 'some data1',
       2=> 'some data2',
       3=> 'some data3',
       4=> 'some data4',
   )
..........
)

现在你的测试的例子,你可以lp me !!!

Now the example for your test, hopfully you can help me!!!

<table border="1" >
  <tbody style="" >

       <tr style="" > 
             <td style="color:blue;">
                  data0
              </td>
                <td style="font-size:15px;">
                 data1
              </td>
                <td style="font-size:15px;">
                  data2
              </td>
                <td style="color:blue;">
                  data3
              </td>
                <td style="color:blue;">
                  data4
              </td>
       </tr>
       <tr style="" > 
             <td style="color:blue;">
                  data00
              </td>
                <td style="font-size:15px;">
                 data11
              </td>
                <td style="font-size:15px;">
                  data22
              </td>
                <td style="color:blue;">
                  data33
              </td>
                <td style="color:blue;">
                  data44
              </td>
       </tr>
       <tr style="color:black" > 
             <td style="color:blue;">
                  data000
              </td>
                <td style="font-size:15px;">
                 data111
              </td>
                <td style="font-size:15px;">
                  data222
              </td>
                <td style="color:blue;">
                  data333
              </td>
                <td style="color:blue;">
                  data444
              </td>
       </tr>

  </tbody>


</table>


推荐答案

您绝对不想使用Regex解析HTML。

You absolutely do NOT want to parse HTML with Regex.

对于一个,有太多的变体,更重要的是,正则表达式与HTML的层次性质不是很好。最好使用XML解析器或更好的HTML特定解析器。

There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.

每当我需要刮HTML时,我倾向于使用简单的HTML DOM Parser 库,它接受一个HTML树并将其解析为可遍历的PHP对象,您可以查询类似JQuery的内容。

Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.

<?php
    require 'simplehtmldom/simple_html_dom.php';

    $sHtml = <<<EOS
    <table border="1" >
      <tbody style="" >
           <tr style="" > 
                 <td style="color:blue;">
                      data0
                  </td>
                    <td style="font-size:15px;">
                     data1
                  </td>
                    <td style="font-size:15px;">
                      data2
                  </td>
                    <td style="color:blue;">
                      data3
                  </td>
                    <td style="color:blue;">
                      data4
                  </td>
           </tr>
           <tr style="" > 
                 <td style="color:blue;">
                      data00
                  </td>
                    <td style="font-size:15px;">
                     data11
                  </td>
                    <td style="font-size:15px;">
                      data22
                  </td>
                    <td style="color:blue;">
                      data33
                  </td>
                    <td style="color:blue;">
                      data44
                  </td>
           </tr>
           <tr style="color:black" > 
                 <td style="color:blue;">
                      data000
                  </td>
                    <td style="font-size:15px;">
                     data111
                  </td>
                    <td style="font-size:15px;">
                      data222
                  </td>
                    <td style="color:blue;">
                      data333
                  </td>
                    <td style="color:blue;">
                      data444
                  </td>
           </tr>
      </tbody>
    </table>
EOS;

    $oHTML = str_get_html($sHtml);
    $oTRs = $oHTML->find('table tr');
    $aData = array();
    foreach($oTRs as $oTR) {
        $aRow = array();
        $oTDs = $oTR->find('td');

        foreach($oTDs as $oTD) {
            $aRow[] = trim($oTD->plaintext);
        }

        $aData[] = $aRow;
    }

    var_dump($aData);
?>

输出:

array
  0 => 
    array
      0 => string 'data0' (length=5)
      1 => string 'data1' (length=5)
      2 => string 'data2' (length=5)
      3 => string 'data3' (length=5)
      4 => string 'data4' (length=5)
  1 => 
    array
      0 => string 'data00' (length=6)
      1 => string 'data11' (length=6)
      2 => string 'data22' (length=6)
      3 => string 'data33' (length=6)
      4 => string 'data44' (length=6)
  2 => 
    array
      0 => string 'data000' (length=7)
      1 => string 'data111' (length=7)
      2 => string 'data222' (length=7)
      3 => string 'data333' (length=7)
      4 => string 'data444' (length=7)

这篇关于仅从php中使用preg_match_all的html表获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆