DOMDocument删除JavaScript字符串中的HTML标签 [英] DOMDocument removes HTML tags in JavaScript string

查看:108
本文介绍了DOMDocument删除JavaScript字符串中的HTML标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发PHP应用程序已经有一段时间了。但是这一现实使我感到挣扎。我正在使用DomDocument加载完整的HTML页面。这些页面是外部的,可能包含JavaScript。这是我无法控制的。

I'm developing PHP applications for quite a while now. But this one realy gets me struggled. I’m loading complete HTML pages using the DomDocument. These pages are external and may contain JavaScript. This is beyond my control.

在某些页面上,事情并没有像当初归结为JavaScript字符串中的基本HTML格式那样呈现。我写下了一个例子来解释这一切。

On some pages things were not rendered the way it supposed to when it came down to basic HTML formatting in JavaScript strings. I've wrote down an example which explains it all.

<?php
$html = new DOMDocument();

libxml_use_internal_errors(true);

$strPage = '<html>
<head>
<title>Demo</title>
<script type="text/javascript">
var strJS = "<b>This is bold.</b><br /><br />This should not be bold. Where did my closing tag go to?";
</script>
</head>
<body>
<script type="text/javascript">
document.write(strJS);
</script>
</body>
</html>';

$html->loadHTML($strPage);
echo $html->saveHTML();
exit;
?>

我错过了什么吗?

编辑:我更改了演示。现在,将LoadHTML更改为LoadXML不再起作用,该演示的输出将通过w3c验证。还将CDATA块添加到JavaScript似乎没有任何效果。

I've changed the demo. Changing the LoadHTML to LoadXML doesn't work anymore now and the output of the demo will pass w3c validation. Also adding the CDATA block to the JavaScript doesn't seem to have any effect.

推荐答案

我不知道为什么(试图找到),但如果您使用 loadXML 而不是 loadHTML

I dont know why (tried to find out), but it works if you load the HTML using loadXML instead of loadHTML

$html = new DOMDocument();

libxml_use_internal_errors(true);

$strPage = "<html><head>";
$strPage .= "<script type=\"text/javascript\">";
$strPage .= "var strJS = \"<b>This is bold.</b><br /><br />This should not be bold. Where did my closing tag go to?\";";
$strPage .= "</script>";
$strPage .= "<body>";
$strPage .= "<script type=\"text/javascript\">";
$strPage .= "document.write(strJS);";
$strPage .= "</script>";
$strPage .= "</body>";
$strPage .= "</head></html>";

$html->loadXML($strPage);

echo $html->saveHTML();

尽管HTML实际上是无效的,但一切都在头上。

Though the HTML is actually invalid, everything is in the head.

这篇关于DOMDocument删除JavaScript字符串中的HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆