如何解析部分HTML? [英] How do I parse partial HTML?

查看:100
本文介绍了如何解析部分HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在PHP中使用DOM解析一些HTML,但是我遇到了一些问题。首先,如果这样改变了解决方案,我所拥有的HTML不是一个整页,而是它只是其中的一部分。

I'm trying to parse some HTML with DOM in PHP, but I'm having some problems. First, in case this change the solution, the HTML that I have is not a full page, rather, it's only part of it.

<!-- This is the HTML that I have --><a href='/games/'>
<div id='game'>
<img src='http://images.example.com/games.gif' width='300' height='137' border='0'>
<br><b> Game </b>
</div>
<div id='double'>
<img src='http://images.example.com/double.gif' width='300' height='27' border='0' alt='' title=''>
</div>
</a>

现在我只想得到id为$ code> double 。我尝试过以下代码,但似乎没有正常工作。我可能会做错什么?

Now I'm trying to get only the div with the id double. I've tried the following code, but it doesn't seem to be working properly. What might I be doing wrong?

//The HTML has been loaded into the variable $html
$dom=new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false; 
$keepme = $dom->getElementById('double'); 

$contents = '<div style="text-align:center">'.$keepme.'</a></div>';
echo $contents;


推荐答案

我认为 DOMDocument :: getElementById 将无法在您的情况下工作:(引用)


要使此功能正常工作,您将
需要设置一些ID属性
DOMElement :: setIdAttribute
DTD,它定义一个属性为
的类型ID。
在后一种情况下,您需要
验证您的文档
DOMDocument :: validate
DOMDocument-> validateOnParse
之前使用此功能。

For this function to work, you will need either to set some ID attributes with DOMElement::setIdAttribute or a DTD which defines an attribute to be of type ID.
In the later case, you will need to validate your document with DOMDocument::validate or DOMDocument->validateOnParse before using this function.



可能有效的解决方案是使用一些 XPath查询来提取元素你正在寻找。


A solution that might work is using some XPath query to extract the element you are looking for.

首先,我们像你第一次一样加载HTML部分:

First of all, let's load the HTML portion, like you first did :

$dom=new domDocument;
$dom->loadHTML($html);
var_dump($dom->saveHTML());

var_dump 只是为了证明HTML部分已经成功加载 - 从其输出判断它有。

The var_dump is here only to prove that the HTML portion has been loaded successfully -- judging from its output, it has.



然后,将 DOMXPath 类,并使用它来查询要获取的元素:


Then, instanciate the DOMXPath class, and use it to query for the element you want to get :

$xpath = new DOMXpath($dom);
$result = $xpath->query("//*[@id = 'double']");
$keepme = $result->item(0);

现在我们需要你想要的元素; - )

We now have to element you want ;-)



但是,为了将HTML内容注入另一个HTML段,我们必须先获取其HTML内容。


But, in order to inject its HTML content in another HTML segment, we must first get its HTML content.

不要记住任何容易的方式来做到这一点,但是这样的东西可以做到这一点:

I don't remember any "easy" way to do that, but something like this sould do the trick :

$tempDom = new DOMDocument();
$tempImported = $tempDom->importNode($keepme, true);
$tempDom->appendChild($tempImported);
$newHtml = $tempDom->saveHTML();
var_dump($newHtml);

而且...我们拥有您的双重 < div>

And... We have the HTML content of your double <div> :

string '<div id="double">
<img src="http://images.example.com/double.gif" width="300" height="27" border="0" alt="" title="">
</div>
' (length=125)



现在,你只是必须做任何你想要的; - )


Now, you just have to do whatever you want with it ;-)

这篇关于如何解析部分HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆