如何解析部分HTML? [英] How do I parse partial HTML?
问题描述
我正在PHP中使用DOM解析一些HTML,但是我遇到了一些问题。首先,如果这样改变了解决方案,我所拥有的HTML不是一个整页,而是它只是其中的一部分。
I'm trying to parse some HTML with DOM in PHP, but I'm having some problems. First, in case this change the solution, the HTML that I have is not a full page, rather, it's only part of it.
<!-- This is the HTML that I have --><a href='/games/'>
<div id='game'>
<img src='http://images.example.com/games.gif' width='300' height='137' border='0'>
<br><b> Game </b>
</div>
<div id='double'>
<img src='http://images.example.com/double.gif' width='300' height='27' border='0' alt='' title=''>
</div>
</a>
现在我只想得到id为$ code> double 。我尝试过以下代码,但似乎没有正常工作。我可能会做错什么?
Now I'm trying to get only the div with the id double
. I've tried the following code, but it doesn't seem to be working properly. What might I be doing wrong?
//The HTML has been loaded into the variable $html
$dom=new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$keepme = $dom->getElementById('double');
$contents = '<div style="text-align:center">'.$keepme.'</a></div>';
echo $contents;
推荐答案
我认为 DOMDocument :: getElementById
将无法在您的情况下工作:(引用)
要使此功能正常工作,您将
需要设置一些ID属性
与DOMElement :: setIdAttribute
或
DTD,它定义一个属性为
的类型ID。
在后一种情况下,您需要
验证您的文档
与DOMDocument :: validate
或
DOMDocument-> validateOnParse
在
之前使用此功能。
For this function to work, you will need either to set some ID attributes with
DOMElement::setIdAttribute
or a DTD which defines an attribute to be of type ID.
In the later case, you will need to validate your document withDOMDocument::validate
orDOMDocument->validateOnParse
before using this function.
可能有效的解决方案是使用一些 XPath查询来提取元素你正在寻找。
A solution that might work is using some XPath query to extract the element you are looking for.
首先,我们像你第一次一样加载HTML部分:
First of all, let's load the HTML portion, like you first did :
$dom=new domDocument;
$dom->loadHTML($html);
var_dump($dom->saveHTML());
var_dump
只是为了证明HTML部分已经成功加载 - 从其输出判断它有。
The var_dump
is here only to prove that the HTML portion has been loaded successfully -- judging from its output, it has.
然后,将 DOMXPath
类,并使用它来查询要获取的元素:
Then, instanciate the DOMXPath
class, and use it to query for the element you want to get :
$xpath = new DOMXpath($dom);
$result = $xpath->query("//*[@id = 'double']");
$keepme = $result->item(0);
现在我们需要你想要的元素; - )
We now have to element you want ;-)
但是,为了将HTML内容注入另一个HTML段,我们必须先获取其HTML内容。
But, in order to inject its HTML content in another HTML segment, we must first get its HTML content.
不要记住任何容易的方式来做到这一点,但是这样的东西可以做到这一点:
I don't remember any "easy" way to do that, but something like this sould do the trick :
$tempDom = new DOMDocument();
$tempImported = $tempDom->importNode($keepme, true);
$tempDom->appendChild($tempImported);
$newHtml = $tempDom->saveHTML();
var_dump($newHtml);
而且...我们拥有您的双重
< div>
:
And... We have the HTML content of your double
<div>
:
string '<div id="double">
<img src="http://images.example.com/double.gif" width="300" height="27" border="0" alt="" title="">
</div>
' (length=125)
现在,你只是必须做任何你想要的; - )
Now, you just have to do whatever you want with it ;-)
这篇关于如何解析部分HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!