从 HTML 内容中删除脚本标签 [英] remove script tag from HTML content
问题描述
我正在使用 HTML Purifier (http://htmlpurifier.org/)
I am using HTML Purifier (http://htmlpurifier.org/)
我只想删除 标签.我不想删除内联格式或任何其他内容.
I just want to remove <script>
tags only.
I don't want to remove inline formatting or any other things.
我怎样才能做到这一点?
How can I achieve this?
还有一件事,还有其他方法可以从 HTML 中删除脚本标签
One more thing, it there any other way to remove script tags from HTML
推荐答案
因为这个问题被标记为 regex 在这种情况下,我将用穷人的解决方案来回答:
Because this question is tagged with regex I'm going to answer with poor man's solution in this situation:
$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);
然而,正则表达式不是用于解析 HTML/XML,即使你编写了 完美 表达式它最终会崩溃,这是不值得的,尽管在某些情况下快速修复一些标记,就像快速修复一样,忘记安全.仅对您信任的内容/标记使用正则表达式.
However, regular expressions are not for parsing HTML/XML, even if you write the perfect expression it will break eventually, it's not worth it, although, in some cases it's useful to quickly fix some markup, and as it is with quick fixes, forget about security. Use regex only on content/markup you trust.
请记住,用户输入的任何内容都应被视为不安全.
Remember, anything that user inputs should be considered not safe.
更好的解决方案是使用专为此设计的 DOMDocument
.这是一个片段,演示了如何轻松、干净(与正则表达式相比)、(几乎)可靠和(几乎)安全:
Better solution here would be to use DOMDocument
which is designed for this.
Here is a snippet that demonstrate how easy, clean (compared to regex), (almost) reliable and (nearly) safe is to do the same:
<?php
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$script = $dom->getElementsByTagName('script');
$remove = [];
foreach($script as $item)
{
$remove[] = $item;
}
foreach ($remove as $item)
{
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
我有意删除了 HTML,因为即使这样也可以bork.
I have removed the HTML intentionally because even this can bork.
这篇关于从 HTML 内容中删除脚本标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!