用DOM和XPath解析HTML表 [英] Parse HTML Table with DOM and XPath

查看:100
本文介绍了用DOM和XPath解析HTML表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用XPath解析HTML表格。网址为:点击此处

I'm trying to parse an HTML Table with XPath. The URL is: click here.

我使用FireBug查看页面的DOM,我明白我需要的容器。

I use FireBug to see page's DOM and i understand the container i need.

<tbody>
<tr class="r1">
<td class="l rbrd">
<img class="spr2 sport sp1" align="absmiddle" src="/s.gif">
</td>
<td class="l rbrd">19/4 18:30</td>
<td class="l rbrd">
<a title="CHELSEA FC - SUNDERLAND" href="/chelsea-fc-vs-sunderland/e/4509648/" target="_blank">CHELSEA FC - SUNDERLAND</a>
</td>
<td class="c w40">
<span class="o">1,21</span>
<span class="p">92,8%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">8,00</span>
<span class="p">4,7%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">18,00</span>
<span class="p">2,5%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c emph">
<span class="o">353.660 €</span>
</td>
<td class="c w10 emph rbrd">
<img class="imgdiff" width="10" height="10" src="http://img.oxytropis.com/s.gif">
</td>
<td class="c rbrd">
<span class="o">1,56</span>
<span class="p">67,5%</span>
</td>
<td class="c rbrd">
<span class="o">2,74</span>
<span class="p">32,5%</span>
</td>
<td class="c emph rbrd">
<span class="o">6.243 €</span>
</td>
<td class="c rbrd">
<a onclick="_gaq.push(['_trackEvent','betfair','click','tziroi-out']);" href="http://sports.betfair.com/Index.do?mi=&ex=1&origin=MRL&rfr=655" rel="nofollow" target="_blank">
</td>
</tr>

这只是一行,还有数百个。
所以我们有所有的行信息,我们可以检查每一行,并检查它是否包含日期,匹配,金钱等...我需要为每个条件,将其存储在一个数组。

This is only one row, there are hundreds more. So we have all rows with informations and we can check every single line and check whether it contains date, match, money etc ... i need to make a condition for each of them, to store all of them in an array.

我遵循本教程:点击这里

有什么条件我可以用来区分每个细胞与另一个细胞?

Wich condition i can use to differentiate each cells from another?

我想要在表格中的每一行都有这样的一个东西:

I want to have something like this for each rows in the table:

[0] => Array
            (
                [date] => 18:30 19/4
                [teams] => CHELSEA FC - SUNDERLAND
                [1] => 1,21
                [1 volumes] => 92,8%
                [X] => 8,00
                [X volumes] => 4,7%
                [2] => 18,00
                [2 volumes] => 2,5%
                [matched] => 353.660 € 
                  ...

            )

这是php,我被阻止在这一点上:

This is the php, i'm blocked at this point:

<?php

$curl = curl_init('http://www.oxybet.ro/pariu/external/betfair-volumes.htm');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
$html = curl_exec($curl);
curl_close($curl);

if (!$html) {
     die("something's wrong!");
}



$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$scores = array();

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');

foreach ($tableRows as $row) {
    // fetch all 'tds' inside this 'tr'
    $td = $xpath->query('td', $row);
    $match = array();


推荐答案

您的查询到目前为止获取所有表行。在下一步中,循环使用这些结果(在PHP中),并根据需要访问行。您可能要使用直接的DOM访问或XPath,无论您喜欢什么。

Your query is fetching all table rows so far. In the next step, loop over these results (in PHP) and access the rows as needed. You might either want to use direct DOM access or XPath, whatever you prefer.

对于使用XPath,请使用在当前上下文中开始查询的XPath表达式,并传递当前行这样。使用数字谓词来限制您要查找的行。例如,要查询团队名称(在第三个表单元格中,XPath计算1索引),请使用类似于

For using XPath, use an XPath expression that starts querying at the current context, and pass the current row as such. Use numerical predicates to limit to the row you're looking for. For example, to query the team name (in the third table cell, XPath counts 1-indexed), use something like

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');
foreach ($tableRows as $row) {
    $team = $xpath->query('./td[3]/a', $row)->item(0)->textContent;
}

查询类属性也可能是可以的,但它们似乎被用于

Querying the class attributes might also be possible, but they seem to be used rather arbitrarily.

现在,使用类似的查询读取其他表格行,构造生成的地图并将其附加到$ $ $ $ $ $ $ $ c> array。

Now, read the other table rows with similar queries, construct the resulting map and append it to the $scores array.

这篇关于用DOM和XPath解析HTML表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆