DOMDocument从HTML源代码中删除脚本标签 [英] DOMDocument remove script tags from HTML source

查看:152
本文介绍了DOMDocument从HTML源代码中删除脚本标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 @Alex在这里使用的方法从HTML删除脚本标签使用内置的DOMDocument文档.问题是,如果我有一个包含Javascript内容的脚本标签,然后又有一个链接到外部Javascript源文件的脚本标签,那么不是所有的脚本标签都已从HTML中删除.

I used @Alex's approach here to remove script tags from a HTML document using the built in DOMDocument. The problem is if I have a script tag with Javascript content and then another script tag that links to an external Javascript source file, not all script tags are removed from the HTML.

$result = '
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>
            hey
        </title>
        <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
        <script>
            alert("hello");
        </script>
    </head>
    <body>hey</body>
</html>
';

$dom = new DOMDocument();
if($dom->loadHTML($result))
{
    $script_tags = $dom->getElementsByTagName('script');

    $length = $script_tags->length;

    for ($i = 0; $i < $length; $i++) {
        if(is_object($script_tags->item($i)->parentNode)) {
            $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
        }
    }

    echo $dom->saveHTML();
}

上面的代码输出:

<html>
    <head>
        <meta charset="utf-8">
        <title>hey</title>
        <script>
        alert("hello");
        </script>
    </head>
    <body>
        hey
    </body>
</html>

从输出中可以看到,仅除去了外部脚本标记.我有什么办法确保删除所有脚本标签?

As you can see from the output, only the external script tag was removed. Is there anything I can do to ensure all script tags are removed?

推荐答案

您的错误实际上是微不足道的.一个DOMNode对象(及其所有后代-DOMElementDOMNodeList和其他几个对象!)在其父元素发生更改时会自动更新,尤其是在其子级数目发生更改时会自动更新.这是写在PHP文档中的两行代码上,但大多被掩盖了.

Your error is actually trivial. A DOMNode object (and all its descendants - DOMElement, DOMNodeList and a few others!) is automatically updated when its parent element changes, most notably when its number of children change. This is written on a couple of lines in the PHP doc, but is mostly swept under the carpet.

如果使用($k instanceof DOMNode)->length循环,然后从节点中删除元素,则会注意到length属性实际上发生了变化!我必须编写自己的库来抵消此问题和其他一些怪癖.

If you loop using ($k instanceof DOMNode)->length, and subsequently remove elements from the nodes, you'll notice that the length property actually changes! I had to write my own library to counteract this and a few other quirks.

解决方案:

if($dom->loadHTML($result))
{
    while (($r = $dom->getElementsByTagName("script")) && $r->length) {
            $r->item(0)->parentNode->removeChild($r->item(0));
    }
echo $dom->saveHTML();

我实际上并没有在循环-只是一次弹出第一个元素.结果: http://sebrenauld.co.uk/domremovescript.php

I'm not actually looping - just popping the first element one at a time. The result: http://sebrenauld.co.uk/domremovescript.php

这篇关于DOMDocument从HTML源代码中删除脚本标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆