如何避免DOM解析添加html doctype,< head>和< body>标签? [英] How to avoid DOM parsing adding html doctype, <head> and <body> tags?
问题描述
<?
$string = '
Some photos<br>
<span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
';
$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
$spans[] = $span;
}
foreach($spans as $span) {
$span->parentNode->removeChild($span);
}
echo $dom->saveHTML();
?>
我正在使用此代码来解析字符串.当此函数返回string时,它会添加一些标签:
I'm using this code to parse strings. When string is returned by this function, it has some added tags:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Some photos<br><br><br><br><br></p></body></html>
有什么办法可以避免这种情况并返回干净的字符串?例如,此输入字符串仅在使用中可以是任何html字符串.
Is there any way to avoid this and to have clean string returned? This input string is just for example, in usage it can be any html string.
推荐答案
我实际上正在寻找相同的解决方案.我一直在使用以下方法来执行此操作,但是在执行 loadHTML()
时,仍将在文本节点周围添加< p>
.我没有没有不使用其他解析器的方法来解决这个问题,或者有一些隐藏的标志告诉它不要这样做.
I'm actually looking for the same solution. I've been using the following method to do this, however the <p>
around the text node will still be added when you do loadHTML()
. I don't there's a way to get around that without using another parser, or there's some hidden flag to tell it to not do that.
此代码:
<?php
function innerHTML($node){
$doc = new DOMDocument();
foreach ($node->childNodes as $child)
$doc->appendChild($doc->importNode($child, true));
return $doc->saveHTML();
}
$string = '
Some photos<br>
<span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($string);
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
$spans[] = $span;
}
foreach($spans as $span) {
$span->parentNode->removeChild($span);
}
echo innerHTML( $dom->documentElement->firstChild );
将输出:
<p>Some photos<br><br><br><br><br></p>
但是,此解决方案当然不能使标记保持100%完整,但是已经很接近了.
However of course this solution does not keep the markup 100% intact, but it's close.
这篇关于如何避免DOM解析添加html doctype,< head>和< body>标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!