使用DOM解析时如何保持HTML格式化完整（无标签剥离） [英] How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)

查看：249 发布时间：2017/6/28 18:43:39 php dom html-parsing domdocument

本文介绍了使用DOM解析时如何保持HTML格式化完整（无标签剥离）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用DOMDocument，我试图阅读HTML文件的一部分，并使用下面的代码将其显示在不同的HTML页面上。我试图访问的DIV部分有几个< p> 标签。问题是当DOM解析文件时，它只会在< p> 标签 - 条标签之间提取文本内容，而段落格式丢失。它合并文本并将它们全部显示为一段。如何保留HTML格式，以便这些段落显示在源文件中？

Employing DOMDocument, I'm trying to read a portion of an HTML file and displaying it on a different HTML page using the code below. The DIV portion that I'm trying to access has several <p> tags. The problem is when DOM parses the file, it only fetches the text content between the <p> tags - strips tags - and the paragraph formatting is lost. It merges the texts and displays them all as one paragraph. How can I keep the HTML formatting so that the paragraphs are displayed as they were in the source file?

HTML代码

<div class="text_container">
<h3>Title</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>     

<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>

DOMDocumnet代码

DOMDocumnet Code

<?php

$page = file_get_contents('word.php');
$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    if ($div->getAttribute('class') === 'text_container') {
         echo '<p>',$div->nodeValue,'</p>';

    }

?>

推荐答案

您可以定义自定义函数 DOMinnerHTML（）（描述 here ）来检索元素的内部HTML，而不是其文本内容。它通过温柔地创建一个新的文档：

You can define a custom function DOMinnerHTML() (described here) to retrieve an element's inner HTML, rather than its text content. It works by temorarlily creating a new document:

<?php 
function DOMinnerHTML($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 
?>

使用示例：

$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    if ($div->getAttribute('class') === 'text_container') {
        $innerHtml = DOMinnerHTML($div);
        echo '<div>' . $innerHtml . '</div>';
    }
}

这篇关于使用DOM解析时如何保持HTML格式化完整（无标签剥离）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用DOM解析时如何保持HTML格式化完整（无标签剥离） [英] How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

使用DOM解析时如何保持HTML格式化完整（无标签剥离） [英] How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭