从HTML内容中删除脚本标签 [英] remove script tag from HTML content

查看:109
本文介绍了从HTML内容中删除脚本标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用HTML Purifier(http://htmlpurifier.org/)

I am using HTML Purifier (http://htmlpurifier.org/)

我只想删除<script>标签. 我不想删除内联格式或其他任何内容.

I just want to remove <script> tags only. I don't want to remove inline formatting or any other things.

我该如何实现?

还有一件事,它还有其他方法可以从HTML中删除脚本标签

One more thing, it there any other way to remove script tags from HTML

推荐答案

因为此问题被标记为在这种情况下,我将用穷人的解决方案来回答:

Because this question is tagged with regex I'm going to answer with poor man's solution in this situation:

$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);

但是,即使您编写 perfect 表达式,正则表达式也不是用于解析HTML/XML的,但它最终会中断,这是不值得的,尽管在某些情况下快速修复某些问题很有用.标记以及快速修复的功能,而无需担心安全.仅在您信任的内容/标记上使用正则表达式.

However, regular expressions are not for parsing HTML/XML, even if you write the perfect expression it will break eventually, it's not worth it, although, in some cases it's useful to quickly fix some markup, and as it is with quick fixes, forget about security. Use regex only on content/markup you trust.

请记住,用户输入的任何内容均应视为不安全.

Remember, anything that user inputs should be considered not safe.

更好的解决方案是使用为此目的设计的DOMDocument. 这是一个片段,演示了如何轻松,干净(与regex相比),(几乎)可靠和(几乎)安全:

Better solution here would be to use DOMDocument which is designed for this. Here is a snippet that demonstrate how easy, clean (compared to regex), (almost) reliable and (nearly) safe is to do the same:

<?php

$html = <<<HTML
...
HTML;

$dom = new DOMDocument();

$dom->loadHTML($html);

$script = $dom->getElementsByTagName('script');

$remove = [];
foreach($script as $item)
{
  $remove[] = $item;
}

foreach ($remove as $item)
{
  $item->parentNode->removeChild($item); 
}

$html = $dom->saveHTML();

我有意删除了HTML,因为即使这样也可能失败.

I have removed the HTML intentionally because even this can bork.

这篇关于从HTML内容中删除脚本标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆