保存 <br>解析 HTML 文本内容时的标签 [英] Preserving <br> tags when parsing HTML text content
问题描述
我有一个小问题.
我想用 PHP 解析一个简单的 HTML 文档.这是简单的 HTML:
I have a little issue.
I want to parse a simple HTML Document in PHP.
Here is the simple HTML :
<html>
<body>
<table>
<tr>
<td>Colombo <br> Coucou</td>
<td>30</td>
<td>Sunny</td>
</tr>
<tr>
<td>Hambantota</td>
<td>33</td>
<td>Sunny</td>
</tr>
</table>
</body>
</html>
这是我的 PHP 代码:
And this is my PHP code :
$dom = new DOMDocument();
$html = $dom->loadHTMLFile("test.html");
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo $cols->item(0)->nodeValue.'<br />';
echo $cols->item(1)->nodeValue.'<br />';
echo $cols->item(2)->nodeValue;
}
但是如您所见,我有一个 <br>
标记并且我需要它,但是当我的 PHP 代码运行时,它会删除此标记.
任何人都可以解释我如何保持它?
But as you can see, I have a <br>
tag and I need it, but when my PHP code runs, it removes this tag.
Can anybody explain me how I can keep it?
推荐答案
我建议您在 XPath 的帮助下捕获表格单元格的值:
I would recommend you to capture the values of the table cells with help of XPath:
$values = array();
$xpath = new DOMXPath($dom);
foreach($xpath->query('//tr') as $row) {
$row_values = array();
foreach($xpath->query('td', $row) as $cell) {
$row_values[] = innerHTML($cell);
}
$values[] = $row_values;
}
另外,我遇到了和你一样的问题,<br>
标签被从获取的内容中剥离出来,因为它们本身被认为是空节点;不幸的是,它们不会自动替换为换行符 (\n
);
Also, I've had the same problem as you with <br>
tags being stripped out of fetched content for the reason that they themselves are considered empty nodes; unfortunately they're not automatically replaced with a newline character (\n
);
所以我所做的是设计了我自己的innerHTML 函数,该函数已在许多项目中证明是无价的.在这里分享给大家:
So what I've done is designed my own innerHTML function that has proved invaluable in many projects. Here I share it with you:
function innerHTML(DOMElement $element, $trim = true, $decode = true) {
$innerHTML = '';
foreach ($element->childNodes as $node) {
$temp_container = new DOMDocument();
$temp_container->appendChild($temp_container->importNode($node, true));
$innerHTML .= ($trim ? trim($temp_container->saveHTML()) : $temp_container->saveHTML());
}
return ($decode ? html_entity_decode($innerHTML) : $innerHTML);
}
这篇关于保存 <br>解析 HTML 文本内容时的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!