从HTML内容中删除脚本标签 [英] remove script tag from HTML content
问题描述
我正在使用HTML Purifier(http://htmlpurifier.org/)
I am using HTML Purifier (http://htmlpurifier.org/)
我只想删除<script>
标签.
我不想删除内联格式或其他任何内容.
I just want to remove <script>
tags only.
I don't want to remove inline formatting or any other things.
我该如何实现?
还有一件事,它还有其他方法可以从HTML中删除脚本标签
One more thing, it there any other way to remove script tags from HTML
推荐答案
因为此问题被标记为 regex 在这种情况下,我将用穷人的解决方案来回答:
Because this question is tagged with regex I'm going to answer with poor man's solution in this situation:
$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);
但是,即使您编写 perfect 表达式,正则表达式也不是用于解析HTML/XML的,但它最终会中断,这是不值得的,尽管在某些情况下快速修复某些问题很有用.标记以及快速修复的功能,而无需担心安全.仅在您信任的内容/标记上使用正则表达式.
However, regular expressions are not for parsing HTML/XML, even if you write the perfect expression it will break eventually, it's not worth it, although, in some cases it's useful to quickly fix some markup, and as it is with quick fixes, forget about security. Use regex only on content/markup you trust.
请记住,用户输入的任何内容均应视为不安全.
Remember, anything that user inputs should be considered not safe.
更好的解决方案是使用为此目的设计的DOMDocument
.
这是一个片段,演示了如何轻松,干净(与regex相比),(几乎)可靠和(几乎)安全:
Better solution here would be to use DOMDocument
which is designed for this.
Here is a snippet that demonstrate how easy, clean (compared to regex), (almost) reliable and (nearly) safe is to do the same:
<?php
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$script = $dom->getElementsByTagName('script');
$remove = [];
foreach($script as $item)
{
$remove[] = $item;
}
foreach ($remove as $item)
{
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
我有意删除了HTML,因为即使这样也可能失败.
I have removed the HTML intentionally because even this can bork.
这篇关于从HTML内容中删除脚本标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!