循环使用简单HTML DOM的表 [英] Looping through a table with Simple HTML DOM

查看:130
本文介绍了循环使用简单HTML DOM的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用简单的HTML DOM从HTML文档中提取数据,我有几个问题需要一些帮助。

I'm using Simple HTML DOM to extract data from a HTML document, and I have a couple of issues that I need some help with.


  1. 在以开头的行上,如果($ td-> find('a'))我想提取href和锚节点分开,并将它们放在单独的变量中。然而,代码不起作用(参见下面的代码中的回声输出)。

  1. On the line that begins with if ($td->find('a')) I want to extract the href and the content of the anchor node separately, and place them in separate variables. The code however doesn't work (see output from echoes in the code below).

最好的方法是什么?请注意,我的目的是在以后的信息中创建一个XML文档,所以我需要正确的顺序信息。

What is the best way to do this? Note that my purpose is to create a XML document out of the information later on, so I need the information in the correct order.

链接导致页面包含有关不同车辆的详细信息(例如最大速度,价格等),我也想提取并放入单独的变量。如何获取这些页面上的数据?

The links leads to pages containing detailed information about the different cars (e.g. "Max speed", "Price" etc) that I also want to extract and put into separate variables. How can I get hold of data on these pages?

<?php
include 'simple_html_dom.php';

$html = new simple_html_dom();
$html = file_get_html('http://www.example.com/foo.html');

$items = array();

foreach ($html->find('table') as $table) {
    foreach ($table->find('tr') as $tr) {

        foreach ($tr->find('td') as $td) {

            if ($td->find('a')) {
                $link = $td->find('a.href');
                echo $link;  // empty

                $text = $td->find('a.text');
                echo $text; // Array
            }
            else {
                echo 'Name: ' . $td;
            }
        }
    }
}


HTML文档如下所示:

The HTML document looks like this:

<div>
    <table>
        <tr>
            <td>
                <a href="car1.html" target="_blank">Car 1</a>
            </td>
            <td>
                Porsche
            </td>
        </tr>
        <tr>
            <td>
                <a href="car2.html" target="_blank">Car 2</a>
            </td>
            <td>
                Chrysler
            </td>
        </tr>
        ... and so on...


推荐答案

p>使用 $ td-> find('a',0) - > href $ td-> find('a' ,0) - > innertext 以访问第一种情况下的元素属性,以及第二种内容。此外,如果可能有多个锚点,请使用0作为安全警卫总是获得第一个。

Use $td->find('a', 0)->href and $td->find('a', 0)->innertext to access element attributes in the first case, and contents in the second. Also, if there might be multiple anchor to be found, use 0 as a safe guard to always get the first one.

这篇关于循环使用简单HTML DOM的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆