保存 解析 HTML 文本内容时的标签 [英] Preserving tags when parsing HTML text content

查看：35 发布时间：2021/10/2 19:41:47 php dom xpath html-parsing

本文介绍了保存 解析 HTML 文本内容时的标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个小问题.
我想用 PHP 解析一个简单的 HTML 文档.这是简单的 HTML:

I have a little issue.
I want to parse a simple HTML Document in PHP. Here is the simple HTML :

<html>
       <body>
             <table>
                     <tr>
                          <td>Colombo <br> Coucou</td> 
                          <td>30</td>
                          <td>Sunny</td> 
                     </tr>
                     <tr>
                          <td>Hambantota</td> 
                          <td>33</td>
                          <td>Sunny</td> 
                     </tr>

             </table>    
       </body>
 </html>

这是我的 PHP 代码:

And this is my PHP code :

$dom = new DOMDocument();

$html = $dom->loadHTMLFile("test.html");

$dom->preserveWhiteSpace = false; 

$tables = $dom->getElementsByTagName('table'); 


$rows = $tables->item(0)->getElementsByTagName('tr'); 


foreach ($rows as $row) 
{ 

  $cols = $row->getElementsByTagName('td'); 

  echo $cols->item(0)->nodeValue.'<br />'; 
  echo $cols->item(1)->nodeValue.'<br />'; 
  echo $cols->item(2)->nodeValue;
}

但是如您所见，我有一个   标记并且我需要它，但是当我的 PHP 代码运行时，它会删除此标记.
任何人都可以解释我如何保持它?

But as you can see, I have a   tag and I need it, but when my PHP code runs, it removes this tag.
Can anybody explain me how I can keep it?

推荐答案

我建议您在 XPath 的帮助下捕获表格单元格的值:

I would recommend you to capture the values of the table cells with help of XPath:

$values = array();

$xpath = new DOMXPath($dom);

foreach($xpath->query('//tr') as $row) {
   $row_values = array();

   foreach($xpath->query('td', $row) as $cell) {
      $row_values[] = innerHTML($cell);
   }

   $values[] = $row_values;
}

另外，我遇到了和你一样的问题，  标签被从获取的内容中剥离出来，因为它们本身被认为是空节点；不幸的是，它们不会自动替换为换行符 (\n);

Also, I've had the same problem as you with   tags being stripped out of fetched content for the reason that they themselves are considered empty nodes; unfortunately they're not automatically replaced with a newline character (\n);

所以我所做的是设计了我自己的innerHTML 函数，该函数已在许多项目中证明是无价的.在这里分享给大家:

So what I've done is designed my own innerHTML function that has proved invaluable in many projects. Here I share it with you:

function innerHTML(DOMElement $element, $trim = true, $decode = true) {
   $innerHTML = '';

   foreach ($element->childNodes as $node) {
      $temp_container = new DOMDocument();
      $temp_container->appendChild($temp_container->importNode($node, true));

      $innerHTML .= ($trim ? trim($temp_container->saveHTML()) : $temp_container->saveHTML());
   }

   return ($decode ? html_entity_decode($innerHTML) : $innerHTML);
}

这篇关于保存 解析 HTML 文本内容时的标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

保存 <br>解析 HTML 文本内容时的标签 [英] Preserving <br> tags when parsing HTML text content

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

保存 &lt;br&gt;解析 HTML 文本内容时的标签 [英] Preserving &lt;br&gt; tags when parsing HTML text content

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

保存 <br>解析 HTML 文本内容时的标签 [英] Preserving <br> tags when parsing HTML text content

登录关闭