将DOMDocument中的特定元素导出为字符串 [英] Export particular element in DOMDocument to string
问题描述
我使用 loadHTML()
函数将一些任意HTML导入到 DOMDocument
中,例如。:
$ html ='< p>< a href =test.php>测试< / a>< / p为H.';
$ doc = new DOMDocument;
$ doc-> loadHTML($ html);
然后我想使用 一旦我做出了这些改变,我想导出HTML字符串(使用<$ c $ < $ gt;> saveHTML() DOMDocument 更改一些属性/
), ... <
DOMDocument
标签自动添加到HTML中。
我明白为什么要添加这些标签(以确保有效的文档),但是如何才能让我的编辑后的HTML(基本上在< body>
标签之间的所有内容)?
我已阅读此帖子,并提供了一些解决方案我宁愿这样做'正确',即不要在< body>
标签中使用字符串替换。 HTML的有效性不是问题,因为它通过一个HTML净化器在手边运行。
任何想法?谢谢。
编辑
我知道 $ node
参数添加到PHP 5.3.6中的 saveHTML()
中,不幸的是我被困在5.2中。
也许源代码这将有所帮助 - 他们正在使用正则表达式去除不必要的字符串:
http://beerpla.net/projects/smartdomdocument-a-smarter-php -domdocument-class /
$ content = preg_replace(array(/ ^ \< \! DOCTYPE。*?< html>< body> / si,
!< / body>< / html> $!si),
,
$这 - > saveHTML()方法);
返回$ content;
saveHTMLExact() - DOMDocument的设计极其糟糕如果您加载的HTML代码不包含< html>
和< body>
标签,它会添加它们因此,当您调用$ doc-> saveHTML()时,您新保存的内容现在具有
DOCTYPE
。 SmartDOMDocument包含一个名为saveHTMLExact()的新函数,它完全符合您的需求 - 它可以保存HTML,而不会增加DOMDocument所做的额外垃圾。
另外,其他问题也有类似的问题:
如何保存没有HTML包装的DOMDocument的HTML?
I'm importing some arbitrary HTML into a DOMDocument
using the loadHTML()
function, eg.:
$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument;
$doc->loadHTML($html);
I then want to change a few attributes/node values using DOMDocument
methods which I can do no problem.
Once I've made these changes I'd like to export the HTML string (using ->saveHTML()
), without the <html><body>...
tags that the DOMDocument
automatically adds to the HTML.
I understand why these are added (to ensure a valid document), but how would I go about just getting my edited HTML back (essentially everything between the <body>
tags)?
I have read this post and while it offers some solutions I would rather do this 'properly', i.e. without using a string replace on the <body>
tags. Validity of the HTML is not an issue as it's run through an HTML purifier before hand.
Any ideas? Thanks.
EDIT
I'm aware of the $node
parameter added to saveHTML()
in PHP 5.3.6, unfortunately I'm stuck with 5.2.
Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:
http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/
$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
"!</body></html>$!si"),
"",
$this->saveHTML());
return $content;
saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html>
and <body>
tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body>
and DOCTYPE
in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
Also, other questions have asked similar things:
How to saveHTML of DOMDocument without HTML wrapper?
这篇关于将DOMDocument中的特定元素导出为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!