PHP DOM / XPath [英] PHP DOM / XPath

查看:91
本文介绍了PHP DOM / XPath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望对于以前做过的人来说应该是一个简单的问题!

Hopefully should be a simple question for someone that has done it before!

我有一个表格格式的旧网络文件列表,其中有很多联系方式。到目前为止,我所管理的是创建一个解析XHTML文档并提取旧客户联系人详细信息的PHP脚本。

I have a list of old web documents in table format with lots of contact details in it. What I have managed so far is to create a PHP script that parses the XHTML doc and pull out old client contact details.

文档格式的一个例子:

<tr>
  <td bgcolor="#CCCCCC" valign="top"><a href="#" class="details">Indigo Blue 123</a></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" align="top"><font class="details">123 Blue House</font></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" align="top"></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"><font class="details">Hanley</font></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"><font class="details">ST13 4SN</font></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"><font class="details">Stoke on Trent</font></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"><font class="details">01875 322511</font></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"></td>
  <td bgcolor="#CCCCCC"></td>
  <td bgcolor="#CCCCCC" valign="top"><a href="http://www.indigoblue123.org.uk" target="_blank" class="details">www.indigoblue123.org.uk</a></td>
  <td bgcolor="#CCCCCC"></td>
</tr>

我需要做的是将所有这些联系人详细信息解析成数组。我不知道如何完成的几件事情是把空的块抓住为空数组条目(即地址2和地址3将为空,但是我需要知道这一点)以及从< a> ..< / a>

What I need to do is parse all of these contact details into an array. The few things that I'm not sure on how to complete is grabbing the empty blocks to be empty array entries (i.e. Address 2 and Address 3 will be blank but I need to know this) as well as grabbing the web address from the <a>..</a> block.

到目前为止,我已经想到所有填充数据都有 class = details 以某种形式。然而,正如我之前提到的,我不知道什么是最好的方式来完成整体的结果。我有不同文件中约20-40个条目。

So far I have figured all populated data has class=details in some form. However, as I mentioned before I'm not sure what the best way to accomplish the overall result is. There around 20-40 entries in the different files I have.

目前为止,我已经管理了这些基础知识:

I have managed the basics with this so far:

<?php
  print '<pre>';
  $html = file_get_contents('old-contacts.xhtml');

  // Create new DOM object:
  $dom = new DomDocument();

  // Load HTML code:
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $details = $xpath->query("//table/tbody/tr[td/font/@class = 'details']");

  for ($i = 0; $i < $details->length; $i++) {
    $data[$i]['data'] = $details->item($i)->nodeValue;
    echo $data[$i]['data'];
  }
  print '</pre>';
?>

任何帮助都会很棒!

谢谢

推荐答案

我相信你正在寻找这样的东西:

I believed you are looking for something like this:

$nodes = $xpath->query('//table/tbody/tr/td[@align="top"] | 
                        //table/tbody/tr/td[@valign="top"]');

$data = array();
foreach ($nodes as $node) {
    $data[] = $node->textContent;
}

这将给你:

Array
(
    [0] => Indigo Blue 123
    [1] => 123 Blue House
    [2] => 
    [3] => 
    [4] => Hanley
    [5] => 
    [6] => ST13 4SN
    [7] => Stoke on Trent
    [8] => 01875 322511
    [9] => 
    [10] => www.indigoblue123.org.uk
)

这篇关于PHP DOM / XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆