如何避免DOM解析添加html doctype,< head>和< body>标签? [英] How to avoid DOM parsing adding html doctype, <head> and <body> tags?

查看:24
本文介绍了如何避免DOM解析添加html doctype,< head>和< body>标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<?
    $string = '
    Some photos<br>
    <span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />    
    ';

    $dom = new DOMDocument();
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $elements = $dom->getElementsByTagName('span');
    $spans = array();
    foreach($elements as $span) {
        $spans[] = $span;
    }
    foreach($spans as $span) {
        $span->parentNode->removeChild($span);
    }
    echo $dom->saveHTML();


?>

我正在使用此代码来解析字符串.当此函数返回string时,它会添加一些标签:

I'm using this code to parse strings. When string is returned by this function, it has some added tags:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Some photos<br><br><br><br><br></p></body></html>

有什么办法可以避免这种情况并返回干净的字符串?例如,此输入字符串仅在使用中可以是任何html字符串.

Is there any way to avoid this and to have clean string returned? This input string is just for example, in usage it can be any html string.

推荐答案

我实际上正在寻找相同的解决方案.我一直在使用以下方法来执行此操作,但是在执行 loadHTML()时,仍将在文本节点周围添加< p> .我没有没有不使用其他解析器的方法来解决这个问题,或者有一些隐藏的标志告诉它不要这样做.

I'm actually looking for the same solution. I've been using the following method to do this, however the <p> around the text node will still be added when you do loadHTML(). I don't there's a way to get around that without using another parser, or there's some hidden flag to tell it to not do that.

此代码:

<?php

function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

 $string = '
    Some photos<br>
    <span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
    <span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />    
    ';

    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($string);
    $elements = $dom->getElementsByTagName('span');
    $spans = array();
    foreach($elements as $span) {
        $spans[] = $span;
    }
    foreach($spans as $span) {
        $span->parentNode->removeChild($span);
    }

    echo innerHTML( $dom->documentElement->firstChild );

将输出:

<p>Some photos<br><br><br><br><br></p>

但是,此解决方案当然不能使标记保持100%完整,但是已经很接近了.

However of course this solution does not keep the markup 100% intact, but it's close.

这篇关于如何避免DOM解析添加html doctype,&lt; head&gt;和&lt; body&gt;标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆