如何使用PHP防止DOM实体的htmlDocument :: saveHTML()? [英] How can I prevent html entities with PHP a DOMDocument::saveHTML()?

查看:60
本文介绍了如何使用PHP防止DOM实体的htmlDocument :: saveHTML()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于自定义存储需求(为什么在这里并不重要,谢谢!)我必须将html < a> 链接保存为特定格式,例如此:

  myDOMNode-> setAttribute( href, {{{123456}}})); 

一切正常,直到我调用 saveHTML()在包含的DOMDocument上。这杀死了它,因为它在%7B 中编码 {



这是旧版应用程序,其中href = {{{123456}}}用作占位符。命令行解析器会完全(未编码)查找此模式,并且无法更改。



我别无选择,只能这样做。



我无法对结果进行htmldecode()。



此HTML永远不会像这样显示,这只是存储需求。 / p>

谢谢您的帮助!



注意:我已经逛了2个小时,但是没有建议的解决方案为我工作。对于那些盲目地将问题标记为重复的人:请发表评论并让我知道。

解决方案

由于旧版代码正在使用 {{{...}}} 作为占位符,在 preg_replace_callback 。一旦生成HTML,以下内容将还原URL编码的占位符:

  $ src =<< EOS 
< html>
< body>
< a href = foo> Bar< / a>
< / body>
< / html>
EOS;

//创建DOM文档
$ dom = new DOMDocument();
$ dom-> loadHTML($ src);

//更改锚点
的'href'属性$ a = $ dom-> getElementsByTagName('a')
-> item(0)
-> setAttribute('href','{{{123456}}}');

// URL解码的回调函数
$ urldecode = function($ matches){
return urldecode($ matches [0]);
};

//将DOMDocument转换为HTML字符串,然后还原/ URL解码占位符
$ html = preg_replace_callback(
'/'。urlencode('{{{')。'\ d +'。urlEncode('}}}')。'/',
$ urldecode,
$ dom-> saveHTML()
);

echo $ html,PHP_EOL;

输出(为简明起见):

 <!DOCTYPE html PUBLIC-// W3C // DTD HTML 4.0 Transitional // EN http://www.w3.org/TR/REC-html40/loose。 dtd> 
< html>
< body>
< a href = {{{123456}}}> Bar< / a>
< / body>
< / html>


Due to custom storage needs (the "why" is not important here, thanks!) I have to save html <a> links in a specific format such as this:

myDOMNode->setAttribute("href", "{{{123456}}}");

Everything works fine until i call saveHTML() on the containing DOMDocument. This kills it, since it encodes { in %7B.

This is a legacy application where href="{{{123456}}}" works as a placeholder. The command-line parser look for this pattern exactly (unencoded) and cannot be changed.

I've no choice but to do it this way.

I cannot htmldecode() the result.

This HTML will never be displayed as this, it is just a storage need.

Thanks for your help!

Note: I've looked around for 2 hours but none of the proposed solution worked for me. For those who will blindly mark the question as duplicate: please comment and let me know.

解决方案

As the legacy code is using {{{...}}} as a placeholder, it may be safe to use a somewhat hackish approach with preg_replace_callback. The following will restore the URL encoded placeholders once the HTML is generated:

$src = <<<EOS
<html>
    <body>
        <a href="foo">Bar</a>
   </body>
</html>
EOS;

// Create DOM document
$dom = new DOMDocument();
$dom->loadHTML($src);

// Alter `href` attribute of anchor
$a = $dom->getElementsByTagName('a')
    ->item(0)
    ->setAttribute('href', '{{{123456}}}');

// Callback function to URL decode match
$urldecode = function ($matches) {
    return urldecode($matches[0]);
};

// Turn DOMDocument into HTML string, then restore/urldecode placeholders 
$html = preg_replace_callback(
    '/' . urlencode('{{{') . '\d+' . urlEncode('}}}') . '/',
    $urldecode,
    $dom->saveHTML()
);

echo $html, PHP_EOL;

Output (indented for clarity):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
    <body>
        <a href="{{{123456}}}">Bar</a>
    </body>
</html>

这篇关于如何使用PHP防止DOM实体的htmlDocument :: saveHTML()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆