将DOMDocument中的特定元素导出为字符串 [英] Export particular element in DOMDocument to string

查看:87
本文介绍了将DOMDocument中的特定元素导出为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 loadHTML()函数将一些任意HTML导入到 DOMDocument 中,例如。:

  $ html ='< p>< a href =test.php>测试< / a>< / p为H.'; 
$ doc = new DOMDocument;
$ doc-> loadHTML($ html);

然后我想使用 DOMDocument 更改一些属性/

一旦我做出了这些改变,我想导出HTML字符串(使用<$ c $ < $ gt;> saveHTML()), ... < DOMDocument 标签自动添加到HTML中。



我明白为什么要添加这些标签(以确保有效的文档),但是如何才能让我的编辑后的HTML(基本上在< body> 标签之间的所有内容)?

我已阅读此帖子,并提供了一些解决方案我宁愿这样做'正确',即不要在< body> 标签中使用字符串替换。 HTML的有效性不是问题,因为它通过一个HTML净化器在手边运行。



任何想法?谢谢。

编辑

我知道 $ node 参数添加到PHP 5.3.6中的 saveHTML()中,不幸的是我被困在5.2中。

解决方案

也许源代码这将有所帮助 - 他们正在使用正则表达式去除不必要的字符串:



http://beerpla.net/projects/smartdomdocument-a-smarter-php -domdocument-class /

  $ content = preg_replace(array(/ ^ \< \! DOCTYPE。*?< html>< body> / si,
!< / body>< / html> $!si),

$这 - > saveHTML()方法);

返回$ content;

saveHTMLExact() - DOMDocument的设计极其糟糕如果您加载的HTML代码不包含< html> < body> 标签,它会添加它们因此,当您调用$ doc-> saveHTML()时,您新保存的内容现在具有

code>< html>< body> 和 DOCTYPE



SmartDOMDocument包含一个名为saveHTMLExact()的新函数,它完全符合您的需求 - 它可以保存HTML,而不会增加DOMDocument所做的额外垃圾。



另外,其他问题也有类似的问题:

如何保存没有HTML包装的DOMDocument的HTML?


I'm importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.:

$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument;
$doc->loadHTML($html);

I then want to change a few attributes/node values using DOMDocument methods which I can do no problem.

Once I've made these changes I'd like to export the HTML string (using ->saveHTML()), without the <html><body>... tags that the DOMDocument automatically adds to the HTML.

I understand why these are added (to ensure a valid document), but how would I go about just getting my edited HTML back (essentially everything between the <body> tags)?

I have read this post and while it offers some solutions I would rather do this 'properly', i.e. without using a string replace on the <body> tags. Validity of the HTML is not an issue as it's run through an HTML purifier before hand.

Any ideas? Thanks.

EDIT

I'm aware of the $node parameter added to saveHTML() in PHP 5.3.6, unfortunately I'm stuck with 5.2.

解决方案

Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
                                  "!</body></html>$!si"),
                            "",
                            $this->saveHTML());

return $content;

saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).

Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).

SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.

Also, other questions have asked similar things:

How to saveHTML of DOMDocument without HTML wrapper?

这篇关于将DOMDocument中的特定元素导出为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆